16th February 2022
Custom error messages are good to add to SAS macros but you can get inconsistent colouration of the message text in multi-line messages. That was something that I just overlooked until I recently came across a solution. That is to use a hyphen at the end of the ERROR/WARNING/NOTE prefix instead of the more usual colon. Any prefixes ending on a hyphen are not included in the log text and the colouration ignores the carriage return that ordinary would change the text colour to black. The simple macro below demonstrates the effect.
%put ERROR: this is a test;
%put ERROR- this is another test;
%put WARNING: this is a test;
%put WARNING- this is another test;
%put NOTE: this is a test;
%put NOTE- this is another test;
ERROR: this is a test
this is another test
WARNING: this is a test
this is another test
NOTE: this is a test
this is another test
14th October 2021
My response to changes in the technology stack used in clinical research is to develop some familiarity with programming and scripting platforms that complement and compete with SAS, a system with which I have been programming since 2000. One of these has been R but Python is another that has taken up my attention and I now also have Julia in my sights as well. There may be others to assess in the fullness of time.
While I first started to explore the Data Science world in the autumn of 2017, it was in the autumn of 2019 that I began to complete LinkedIn training courses on the subject. Good though they were, I find that I need to actually use a tool in order to better understand it. At that time, I did get to hear about Python packages like Pandas, NumPy, SciPy, Scikit-learn, Matplotlib, Seaborn and Beautiful Soup though it took until of spring of this year for me to start gaining some hands-on experience with using any of these.
During the summer of 2020, I attended a BCS webinar on the CodeGrades initiative, a programming mentoring scheme inspired by the way classical musicianship is assessed. In fact, one of the main progenitors is a trained classical musician and teacher of classical music who turned to Python programming when starting a family so as to have a more stable income. The approach is that a student selects a project and works their way through it with mentoring and periodic assessments carried out in a gentle and discursive manner. Of course, the project has to be engaging for the learning experience to stay the course and that point came through in the webinar.
That is one lesson that resonates with me with subjects as diverse as web server performance and the ongoing pandemic pandemic supplying data and there are other sources of public data to examine as well before looking through my own personal archive gathered over the decades. Some subjects are uplifting while others are more foreboding but the key thing is that they sustain interest and offer opportunities for new learning. Without being able to dream up new things to try, my knowledge of R and Python would not be as extensive as it is and I hope that it will help with learning Julia too.
In the main, my own learning has been a solo effort with consultation of documentation along with web searches that have brought me to the likes of Real Python, Stack Abuse, Data Viz with Python and R and others for longer tutorials as well as threads on Stack Overflow. Usually, the web searching begins when I need a steer on a particular or a way to resolve a particular error or warning message but books always are worth reading even if that is the slower route. Those from the Dummies series or from O’Reilly have proved must useful so far but I do need to read them more completely than I already have; it is all too tempting to go with the try the “programming and search for solutions as you go” approach instead.
To get going, many choose the Anaconda distribution to get Jupyter notebook functionality but I prefer a more traditional editor so Spyder has been my tool of choice for Python programming and there are others like PyCharm as well. Spyder itself is written in Python so it can be installed using pip from PyPi like other Python packages. It has other dependencies like Pylint for code management activities but these get installed behind the scenes.
The packages that I first met in 2019 may be the mainstays for doing data science but I have discovered others since then. It also seems that there is porosity between the worlds of R an Python so you get some Python packages aping R packages and R has the Reticulate package for executing Python code. There are Python counterparts to such Tidyverse stables as dply and ggplot2 in the form of Siuba and Plotnine, respectively. The syntax of these packages are not direct copies of what is executed in R but they are close enough for there to be enough familiarity for added user friendliness compared to Pandas or Matplotlib. The interoperability does not stop there for there is SQLAlchemy for connecting to MySQL and other databases (PyMySQL is needed as well) and there also is SASPy for interacting with SAS Viya.
Pyhton may not have the speed of Julia but there are plenty of packages for working with larger workloads. Of these, Dask, Modin and RAPIDS all have there uses for dealing with data volumes that make Pandas code crawl. As if to prove that there are plenty of libraries for various forms of data analytics, data science, artificial intelligence and machine learning, there also are the likes of Keras, TensorFlow and NetworkX. These are just a selection of what is available and there is no need not to check out more. It may be tempting to stick with the most popular packages all the time, especially when they do so much, but it never hurst to keep an open mind either.
10th October 2021
Recently, a client of mine updated one of their systems from SAS 9.4 M5 to SAS 9.4 M7. In spite of performing due diligence regarding changes between the maintenance release, a change in behaviour of the SYSODSESCAPECHAR automatic macro variable surprised them. The macro variable captures the assignment of the ODS escape character used to prefix RTF codes for page numbering and other things. That setting is made using an ODS ESCAPECHAR statement like the following:
In the M5 release, the tilde character in this example was output by the automatic macro variable but that changed in the M7 release to 7E, the hexadecimal code for the same and this tripped up one of their validated macro programs used in output production. The adopted solution was to use the escape sequence (*ESC*) that gave the same outcome that was there before the change. That was less verbose than alternative code changing the hexadecimal code into the expected ASCII character that follows.
The above supplies a hexadecimal code to the BYTE function for correct rendering with the SYMPUT routine assigning the resulting value to a macro variable named new. Just using the escape sequence is far more succinct though there is now an added validation need once user pilot testing has completed. In my line of business, the updating of code is the quickest part of many such changes; documentation and testing always take longer.
11th September 2021
The thrust of an exhortation from a computing handbook publisher comes to mind here: don’t just look things up on Google, read a book so you really understand what you are doing. Something like those words was used to sell an eBook on Github but the same sentiment applies to R or any other computing language. Using a search engine will get you going or add to existing knowledge but only a book or a training course will help to embed real competence.
In the case of R, there is a myriad of blogs out there that can be consulted as well as function and package documentation on RDocumentation or rrdr.io. For the former, R-bloggers or R Weekly can make good places to start while ones like Stats and R, Statistics Globe, STHDA, PSI’s VIS-SIG and anything from Posit (including their main blog as well as their AI one) can be worth consulting. Additionally, there is also RStudio Education and the NHS-R Community, which also have a Github repository together with a YouTube channel. Many packages have dedicated websites as well so there is no lack of documentation with all of these so here is a selection:
Distill for R Markdown
Databases using R
COVID-19 Data Hub
To come to the real subject of this post, R is unusual in that books that you can buy also have companions websites that contain the same content with the same structure. Whatever funds this approach (and some appear to be supported by RStudio itself by the looks of things), there certainly are a lot of books available freely online in HTML as you will see from the list below while a few do not have a print counterpart as far as I know:
Big Book of R
R Programming for Data Science
Hands-On Programming with R
Cookbook for R
R Graphics Cookbook
R Markdown: The Definitive Guide
R Markdown Cookbook
RMarkdown for Scientists
bookdown: Authoring Books and Technical Documents with R Markdown
blogdown: Creating Websites with R Markdown
pagedown: Create Paged HTML Documents for Printing from R Markdown
Dynamic Documents with R and knitr
Engineering Production-Grade Shiny Apps
Outstanding User Interfaces with Shiny
Mastering Spark with R
Happy Git and GitHub for the useR
HTTP Testing in R
Outstanding User Interfaces with Shiny
Engineering Production-Grade Shiny Apps
The Shiny AWS Book
Many of the above have counterparts published by O’Reilly or Chapman & Hall, to name the two publishers that I have found so far. Aside from sharing these with you, there is also the personal motivation of having the collection of links somewhere so I can close tabs in my Firefox session. There are other web articles open in other tabs that I need to retain and share but these will need to do for now and I hope that you find them as useful as I do.
18th April 2021
Recently, I shared my thoughts on learning new computing languages by oneself using books, online research and personal practice. As successful as that can be, there remains a place for getting some actual instruction as well. Maybe that is why so many turn to YouTube, where there is a multitude of video channels offering such possibilities without cost. What I have also discovered is that this is complemented by a host of other providers whose services attract a fee, and there will be a few of those mentioned later in this post. Paying for online courses does mean that you can get the benefit of curation and an added assurance of quality in what appears to be a growing market.
The variation in quality can dog the YouTube approach, and it also can be tricky to find something good, even if the platform does suggest new videos based on what you have been watching. Much of what is found there does take the form of webinars from the likes of the Why R? Foundation, Posit or the NHSR Community. These can be useful, and there are shorter videos from such providers as the Association of Computing Machinery or SAS Users. These do help more if you already have some knowledge about the topic area being discussed, so they may not make the best starting points for someone who is starting from scratch.
Of course, working your way through a good book will help, and it is something that I have been known to do, but supplementing this with one or more video courses really adds to the experience and I have done a few of these on LinkedIn. That part of the professional platform came from the acquisition of Lynda.com and the topic areas range from soft skills like time management through to computing skills courses with R, SAS and Python seeing coverage among the data science portfolio. Even O’Reilly has ventured into the area in an expansion from the book publishing activities for which so many of us know the organisation.
The available online instructor community does not stop at the above since there are others like Degreed, Baeldung, Udacity, Programiz, Udemy, Business Science and Datanovia. Some of these tend towards online education provision that feels more like an online university course and those are numerous as well as you will find through Data Science Central or KDNuggets. Both of these earn income from advertising to pay for featured blog posts and newsletters, while the former also organises regular webinars and was my first port of call when I became curious about the world of data science during the autumn of 2017.
My point of approach into the world of online training has been as a freelance information professional needing to keep up to date with a rapidly changing field. The mix of content that is both free of charge and that which attracts a fee is one that can work. Both kinds do complement each other while possessing their unique advantages and disadvantages. The need to continually expand skills and knowledge never goes away, so it is well worth spending some time working what you are after, since you need to be sure that any training always adds to your own knowledge and skill level.
26th December 2019
Recently, I needed to inactivate blocks of code in a Perl script while doing some testing. This is something that I often do in other computing languages so I sought the same in Perl. To do that, I need to use the POD methodology. This meant enclosing the code as follows.
<< Code to be inactivated by inclusion in a comment >>
The =start line could use any word after the equality sign but it seems that =cut is needed to close the multi-line comment. If this was actual programming documentation, then the comment block should include some meaningful text for use with perldoc but that was not a concern here since the commenting statements would be removed afterwards anyway and it is good practice not to leave commented code in a production script or program to avoid any later confusion.
In my case, this facility allowed me to isolate the code that I needed to alter and test before putting everything back as needed. It also saved time since I did not need to individually comment out every executable line because multiple lines could be inactivated at a time.
9th November 2018
For as long as I have been programming with SAS, there has been the ability to test if a variable does or does not have one value from a list of values in data step IF clauses or WHERE clauses in both data step and most if not all procedures. It was only within the last decade that its Macro language got similar functionality with one caveat that I recently uncovered: you cannot have a NOT IN construct. To get that, you need to go about things in a different way.
In the example below, you see the NOT operator being placed before the IN operator component that is enclosed in parentheses. If this is not done, SAS produces the error messages that caused me to look at SAS Usage Note 31322. Once I followed that approach, I was able to do what I wanted without resorting to older more long-winded coding practices.
%if not (&x in (a b c)) %then %do;
%put Value is not included;
%put Value is included;
Running the above code should produce a similar result to another featured on here in another post but the logic is reversed. There are times when such an approach is needed. One is where a small number of possibilities is to be excluded from a larger number of possibilities. Programming often involves more inventive thinking and this may be one of those.
12th August 2014
It appears that PROC SGPLOT and other statistical graphics procedures create image files even if you are creating RTF or PDF files. By default, these are PNG files but there are other possibilities. When working with PC SAS , I have seen them written to the current working directory and that could clutter up your folder structure, especially if they are unwanted.
Being unable to track down a setting that controls this behaviour, I resolved to find a way around it by sending the files to the SAS work directory so they are removed when a SAS session is ended. One option is to set the session’s working directory to be the SAS work one and that can be done in SAS code without needing to use the user interface. As a result, you get some automation.
The method is implicit though in that you need to use an X statement to tell the operating system to change folder for you. Here is the line of code that I have used:
x "cd %sysfunc(pathname(work))";
The X statement passes commands to an operating system’s command line and they are enclosed in quotes. %sysfunc then is a macro command that allows certain data step functions or call routines as well as some SCL functions to be executed. An example of the latter is pathname and this resolves library or file references and it is interrogating the location of the SAS work library here so it can be passed to the operating systems cd (change directory) command for processing. This method works on Windows and UNIX so Linux should be covered too, offering a certain amount of automation since you don’t have to specify the location of the SAS work library in every session due to the folder name changing all the while.
Of course, if someone were to tell me of another way to declare the location of the generated PNG files that works with RTF and PDF ODS destinations, then I would be all ears. Even direct output without image file creation would be even better. Until then though, the above will do nicely.
4th December 2013
Recently, I needed to put in place some code to detect the presence or absence of a variable in a dataset and I chose SAS Macro programming as the way to do what I wanted. The logic was based on a SAS sample that achieved the same result in a data step and some code that I had for detecting the presence or absence of a dataset. Mixing the two together gave me something like the following:
%if &dsid > 0 %then %let rc=%sysfunc(close(&dsid));
%if &varexist gt 0 %then %put Info: Variable &var is in the &ds dataset;
%else %put Info: Variable &var is not in the &ds dataset;
What this does is open up a dataset and look for the variable number in the dataset. In datasets, variables are numbered from left to right with 1 for the first one, 2 for the second and so on. If the variable is not in the dataset, the result is 0 so you know that it is not there. All of this is what the VARNUM SCL function within the SYSFUNC macro function does. In the example, this resolves to %sysfunc(varnum(&dsid,var)) with no quotes around the variable name like you would do in data step programming. Once you have the variable number or 0, then you can put in place some conditional logic that makes use of the information like what you see in the above simple example. Of course, that would be expanded to something more useful in real life but I hope it helps to show you the possibilities here.
12th November 2012
If you had asked me about getting two or more graphs on a page using SAS/GRAPH procedures, I might have suggested PROC GREPLAY as the means to achieve it. However, I recently came across another way to do the same thing by using ODS. It helped that the graphs were being produced using the PDF destination because I don’t think that what follows will work with the RTF one.
For this three plots on a page example, I first set the orientation to landscape so that the plots can be arranged side by side in a single row:
Next, the PDF destination was opened with page breaks turned off for the required output file using the STARTPAGE option:
ods pdf file="c:\test.pdf" startpage=off;
The listing destination was turned off as well since it is not needed:
ods listing close;
With that complete, a page or region break gets inserted. This could have been repeated before every procedure to get it popped into the next region on the page but that is the default behaviour for any extra procedural step so it wasn’t needed.
ods pdf startpage=now;
Then, the ODS LAYOUT feature is started so that the layout can be defined on the page:
ods layout start;
For the first plot and the one at the left of the triptych, a region was defined absolutely (grid layouts are available but I didn’t make use of them here) using ODS REGION. Since all plots were to be of the same size, the width was defined as being a third of the page and the bottom left hand corner of the region defined to be the same as that of the plot area on the page. Titles and footnotes usefully lie outside this region in the way that SAS arranges things so there is no further messing. With the region define, it’s a matter of running the required SAS/GRAPH procedure. In my case, this was GPLOT but I am certain that others would work as well. The height was defined as the full possible plot height. This could have a use if I wanted more than one row of graphs on a page with a trellis plot being an example of that sort of arrangement.
ods region x=0pct y=0pct width=33pct height=100pct;
<< SAS/GRAPH Procedure >>
For the middle plot, the starting position is moved a third of the way along the page while the section area has the same dimensions as before. Using percentages in these definitions does make their usage easier.
ods region x=33pct y=0pct width=33pct height=100pct;
<< SAS/GRAPH Procedure >>
Lastly, the right-hand plot has a starting position two-thirds of the width of the page and the other dimensions are as per the other panels:
ods region x=66pct y=0pct width=33pct height=100pct;
<< SAS/GRAPH Procedure >>
With the last graph created, it is time to close ODS LAYOUT and the PDF destination. Then, the listing destination is reopened again.
ods layout end;
ods pdf close;
Update 2012-12-08: Since writing the above, I have learned that ODS LAYOUT and ODS REGION have yet to become production features of SAS with 9.3 as the latest version. However, I have encountered an alternative that uses the STARTPAGE=NEVER ODS PDF option to turn off page breaks and GOPTIONS statements to control the regions occupied by plots. It’s Sample 48569 on the SAS website. Having a production equivalent is better since pre-production features are best avoided in production code. If I had realised the status, I would have used PROC GREPLAY to achieve what I needed to do.