Programming | Technology Tales

Something to watch with the SYSODSESCAPECHAR automatic SAS macro variable

10th October 2021

Recently, a client of mine updated one of their systems from SAS 9.4 M5 to SAS 9.4 M7. In spite of performing due diligence regarding changes between the maintenance release, a change in behaviour of the SYSODSESCAPECHAR automatic macro variable surprised them. The macro variable captures the assignment of the ODS escape character used to prefix RTF codes for page numbering and other things. That setting is made using an ODS ESCAPECHAR statement like the following:

ods escapechar="~";

In the M5 release, the tilde character in this example was output by the automatic macro variable but that changed in the M7 release to 7E, the hexadecimal code for the same and this tripped up one of their validated macro programs used in output production. The adopted solution was to use the escape sequence (*ESC*) that gave the same outcome that was there before the change. That was less verbose than alternative code changing the hexadecimal code into the expected ASCII character that follows.

data _null_; call symput("new",byte(input("&sysodsescapechar.",hex.))); run;

The above supplies a hexadecimal code to the BYTE function for correct rendering with the SYMPUT routine assigning the resulting value to a macro variable named new. Just using the escape sequence is far more succinct though there is now an added validation need once user pilot testing has completed. In my line of business, the updating of code is the quickest part of many such changes; documentation and testing always take longer.

Online learning

18th April 2021

Recently, I shared my thoughts on learning new computing languages by oneself using books, online research and personal practice. As successful as that can be, there remains a place for getting some actual instruction as well. Maybe that is why so many turn to YouTube, where there is a multitude of video channels offering such possibilities without cost. What I have also discovered is that this is complemented by a host of other providers whose services attract a fee, and there will be a few of those mentioned later in this post. Paying for online courses does mean that you can get the benefit of curation and an added assurance of quality in what appears to be a growing market.

The variation in quality can dog the YouTube approach, and it also can be tricky to find something good, even if the platform does suggest new videos based on what you have been watching. Much of what is found there does take the form of webinars from the likes of the Why R? Foundation, Posit or the NHSR Community. These can be useful, and there are shorter videos from such providers as the Association of Computing Machinery or SAS Users. These do help more if you already have some knowledge about the topic area being discussed, so they may not make the best starting points for someone who is starting from scratch.

Of course, working your way through a good book will help, and it is something that I have been known to do, but supplementing this with one or more video courses really adds to the experience and I have done a few of these on LinkedIn. That part of the professional platform came from the acquisition of Lynda.com and the topic areas range from soft skills like time management through to computing skills courses with R, SAS and Python seeing coverage among the data science portfolio. Even O’Reilly has ventured into the area in an expansion from the book publishing activities for which so many of us know the organisation.

The available online instructor community does not stop at the above since there are others like Degreed, Baeldung, Udacity, Programiz, Udemy, Business Science and Datanovia. Some of these tend towards online education provision that feels more like an online university course and those are numerous as well as you will find through Data Science Central or KDNuggets. Both of these earn income from advertising to pay for featured blog posts and newsletters, while the former also organises regular webinars and was my first port of call when I became curious about the world of data science during the autumn of 2017.

My point of approach into the world of online training has been as a freelance information professional needing to keep up to date with a rapidly changing field. The mix of content that is both free of charge and that which attracts a fee is one that can work. Both kinds do complement each other while possessing their unique advantages and disadvantages. The need to continually expand skills and knowledge never goes away, so it is well worth spending some time working what you are after, since you need to be sure that any training always adds to your own knowledge and skill level.

Self-learning new computing languages

10th April 2021

Over the years, I have taught myself a number of computing languages with some coming in useful for professional work while others came in handy for website development and maintenance. The collection has grown to include HTML, CSS, XML, Perl, PHP and UNIX Shell Scripting. The ongoing pandemic allowed to me added two more to the repertoire: R and Python.

My interest in these arose from my work as an information professional concerned with standardisation and automation of statistical results delivery. To date, the main focus has been on clinical study data but ongoing changes in the life sciences sector could mean that I may need to look further afield so having extra knowledge never hurts. Though I have been a SAS programmer for more than twenty years, its predominance in the clinical research field is not what it was so that I am having to rethink things.

As it happens, I would like to continue working with SAS since it does so much and thoughts of leaving it after me bring sadness. It also helps to know what the alternatives might be and to reject some management hopes about any newcomers, especially with regard to the amount of code being produced and the quality of graphs being created. Use cases need to be assessed dispassionately even when emotions loom behind the scenes.

Both R and Python bring large scripting ecosystems with active communities so the attraction of their adoption makes a deal of sense. SAS is comparable in the scale of its own ecosystem though there are considerable differences and the platform is catching up when it comes to Data Science. The aforementioned open source languages may have had a head start but it seems that others are not standing still either. It is a time to have wider awareness and online conference attendance helps with that.

The breadth of what is available for any programming language more than stymies any attempt to create a truly all encompassing starting point and I have abandoned thoughts of doing anything like that for R. Similarly, I will not even try such a thing for Python. Consequently, this means that my sharing of anything learned will be in the form of discrete postings from time to time, especially given ho easy it is to collect numerous website links for sharing.

The learning has been facilitated by ongoing pandemic restrictions though things are opening up a little now. The pandemic also has given us public data that can be used for practice since much can be gained from having one’s own project instead of completing exercises from a book. Having an interesting data set with which to work is a must and COVID-19 data contain a certain self-interest as well though one always is mindful of the suffering and loss of life that has been happening since the pandemic first took hold.

Generating PNG files in SAS using ODS Graphics

21st December 2019

Recently, I had someone ask me how to create PNG files in SAS using ODS Graphics so I sought out the answer for them. Normally, the suggestion would have been to create RTF or PDF files instead but there was a specific need so a different approach was taken. Adding the something like following lines before an SGPLOT, SGPANEL or SGRENDER procedure does the needful:

ods listing gpath='E:\'; ods graphics / imagename="test" imagefmt=png;

Here, the ODS LISTING statement declares the destination for the desired graphics file while the ODS GRAPHICS statement defines the file name and type. In the above example, the file test.png would be created in the root of the E drive of a Windows machine. However, this also works with Linux or UNIX directory paths.

Performing parallel processing in Perl scripting with the Parallel::ForkManager module

30th September 2019

In a previous post, I described how to add Perl modules in Linux Mint while mentioning that I hoped to add another that discusses the use of the Parallel::ForkManager module. This is that second post and I am going to keep things as simple and generic as they can be. There are other articles like one on the Perl Maven website that go into more detail.

The first thing to do is ensure that the Parallel::ForkManager module is called by your script and having the following line near the top will do just that. Without this step, the script will not be able to find the required module by itself and errors will be generated.

use Parallel::ForkManager;

Then, the maximum number of threads needs to be specified. While that can be achieved using a simple variable declaration, the following line reads this from the command used to invoke the script. It even tells a forgetful user what they need to do in its own terse manner. Here $0 is the name of the script and N is the number of threads. Not all these threads will get used and processing capacity will limit how many actually are in use so there is less chance of overwhelming a CPU.

my $forks = shift or die "Usage: $0 N\n";

Once the maximum number of available threads is known, the next step is to instantiate the Parallel::ForkManager object as follows to use these child processes:

my $pm = Parallel::ForkManager->new($forks);

With the Parallel::ForkManager object available, it is now possible to use it as part of a loop. A foreach loop works well though only a single array can be used with hashes being needed when other collections need interrogation. Two extra statements are needed with one to start a child process and another to end it.

foreach $t (@array) { my $pid = $pm->start and next; << Other code to be processed >> $pm->finish; }

Since there often is other processing performed by script and it is possible to have multiple threaded loops in one, there needs to be a way of getting the parent process to wait until all the child processes have completed before moving from one step to another in the main script and that is what the following statement does. In short, it adds more control.

$pm->wait_all_children;

To close, there needs to be a comment on the advantages of parallel processing. Modern multi-core processors often get used in single threaded operations and that leaves most of the capacity unused. Utilising this extra power then shortens processing times markedly. To give you an idea of what can be achieved, I had a single script taking around 2.5 minutes to complete in single threaded mode while setting the maximum number of threads to 24 reduced this to just over half a minute while taking up 80% of the processing capacity. This was with an AMD Ryzen 7 2700X CPU with eight cores and a maximum of 16 processor threads. Surprisingly, using 16 as the maximum thread number only used half the processor capacity so it seems to be a matter of performing one’s own measurements when making these decisions.

Creating a data-driven informat in SAS

27th September 2019

Recently, I needed to create some example data with an extra numeric identifier variable that would be assigned according to the value of a character identifier variable. Not wanting to add another dataset merge or join to the code, I decided to create an informat from data. Initially, I looked into creating a format instead but it not accomplish what I wanted to do.

data patient; keep fmtname start end label type; set test.dm; by subject; fmtname="PATIENT"; start=subject; end=start; label=patient; type="I"; run;

The input data needed a little processing as shown above. The format name was added in the variable FMTNAME and TYPE variable was assigned a value of I to make this a numeric informat; to make character equivalent, a value of J is assigned. The START and END variables declare the value range associated with the value of LABEL that would become the actual value of the numeric identifier variable. The variable names are fixed because the next step will not work with different ones.

proc format lib=work cntlin=patient; run; quit;

To create the actual informat, the dataset is read by a FORMAT procedure with the CNTLIN parameter specifying the name of the input dataset and LIB defining the library where the format catalog is stored. When this in complete, the informat is available for use with an input function as shown in the code excerpt below.

data ae1; set ae; patient=input(subject,patient.); run;

Compressing an entire dataset library using SAS

26th September 2019

Turning dataset compression for SAS datasets can produce quite a reduction in size so it often is standard practice to do just this. It can be set globally with a single command and many working systems do this for you:

options compress=yes;

It also can be done on a dataset by dataset option by adding (compress=yes) beside the dataset name on the data line of a data step. The size reduction can be gained retrospectively too but only if you create additional copies of the datasets. The following code uses the DATASETS procedure to accomplish this end at the time of copy the datasets from one library to another:

proc datasets lib=adam nolist; copy inlib=adam outlib=adamc noclone datecopy memtype=data; run; quit;

The NOCLONE option on the COPY statement allows the compression status to be changed while the DATECOPY one keeps the same date/time stamp on the new file at the operating system level. Lastly, the MEMTYPE=DATA setting ensures that only datasets are processed while the NOLIST option on the PROC DATASETS line suppresses listing of dataset information.

It may be possible to do something like this with PROC COPY too but I know from use that the option presented here works as expected. It also does not involve much coding effort, which helps a lot.

Getting Eclipse to start without incompatibility errors on Linux Mint 19.1

12th June 2019

Recent curiosity about Java programming and Groovy scripting got me trying to start up the Eclipse IDE that I had install on my main machine. What I got instead of a successful application startup was a message that included the following:

!MESSAGE Exception launching the Eclipse Platform: !STACK java.lang.ClassNotFoundException: org.eclipse.core.runtime.adaptor.EclipseStarter at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:466) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:566) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:499) at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:626) at org.eclipse.equinox.launcher.Main.basicRun(Main.java:584) at org.eclipse.equinox.launcher.Main.run(Main.java:1438) at org.eclipse.equinox.launcher.Main.main(Main.java:1414)

The cause was a mismatch between Eclipse and the installed version of Java that it needed in order to run. After all, the software itself is written in the Java language and the installed version from the usual software repositories was too old for Java 11. The solution turned out to be installing a newer version as a Snap (Ubuntu’s answer to Flatpak). The following command did the needful since Snapd already was running on my machine:

sudo snap install eclipse --classic

The only part of the command that warrants extra comment is the --classic switch since that is needed for a tool like Eclipse that needs to access a host file system. On executing, the software was downloaded from Snapcraft and then installed within its own bundle of dependencies. The latter adds a certain detachment from the underlying Linux installation and ensures that no messages appear because of incompatibilities like the one near the start of this post.

Using NOT IN operator type functionality in SAS Macro

9th November 2018

For as long as I have been programming with SAS, there has been the ability to test if a variable does or does not have one value from a list of values in data step IF clauses or WHERE clauses in both data step and most if not all procedures. It was only within the last decade that its Macro language got similar functionality with one caveat that I recently uncovered: you cannot have a NOT IN construct. To get that, you need to go about things in a different way.

In the example below, you see the NOT operator being placed before the IN operator component that is enclosed in parentheses. If this is not done, SAS produces the error messages that caused me to look at SAS Usage Note 31322. Once I followed that approach, I was able to do what I wanted without resorting to older more long-winded coding practices.

options minoperator;

%macro inop(x);

%if not (&x in (a b c)) %then %do; %put Value is not included; %end; %else %do; %put Value is included; %end;

%mend inop;

%inop(a);

Running the above code should produce a similar result to another featured on here in another post but the logic is reversed. There are times when such an approach is needed. One is where a small number of possibilities is to be excluded from a larger number of possibilities. Programming often involves more inventive thinking and this may be one of those.

WARNING: No bars were drawn. This could have been caused by ORDER= on the AXIS statement. You might wish to use the MIDPOINTS= option on the VBAR statement instead.

25th September 2015

What you see above is a an error issued by a SAS program like what a colleague at work recently found. The following code will reproduce this so let us walk through the steps to explain a possible cause for this.

The first stage is to create a test dataset containing variables y and x, for the vertical and midpoint axes, respectively, and populating these using a CARDS statement in a data step:

data a; input y x; cards; 1 5 3 9 ; run;

Now, we define an axis with tick marks for particular values that will be used as the definition for the midpoint or horizontal axis of the chart:

axis1 order=(1 3);

Then, we try creating the chart using the GCHART procedure that comes with SAS/GRAPH and this is what results in the error message being issued in the program log:

proc gchart data=a; vbar x / freq=y maxis=axis1; run; quit;

The cause is that the midpoint axis tick marks are no included in the data so changing these to the actual values of the x variable removes the message and allows the creation of the required chart. Thus, the AXIS1 statement needs to become the following:

axis1 order=(5 9);

Another solution is to remove the MAXIS option from the VBAR statement and let GCHART be data driven. However, if requirements do not allow this, create a shell dataset with all expected values for the midpoint axis with y set 0 since that is used for presenting frequencies as per the FREQ option in the VBAR statement.

« Older Entries « » Newer Entries »

Topics Discussed

Translate