Adventures & experiences in contemporary technology
Recently, I got to experimenting with Ansible after reading about the orchestration tool in a copy of Admin magazine. It came in handy for updating a few web servers that I have as well as updating my main Linux workstation. For the former, automated entry of SSH passwords sufficed but the same did not apply for sudo usage on my local machine. This meant that I needed to use Ansible Vault to store the administrator password and doing so opened up a file in the Vi editor. since I am not familiar with Vi and wanted to get things sorted quickly, I fancied using something more user-friendly like Nano.
Doing this meant adding the following line to .bashrc:
Saving and closing the file followed by reloading the session set me up for what was needed.
There is a simple principal with the 7G Firewall from Perishable press: it is a set of mod_rewrite rules for the Apache web server that can be added to a .htaccess file and there also is a version for the Nginx web server as well. These check query strings, request Uniform Resource Identifiers (URI’s), user agents, remote hosts, HTTP referrers and request methods for any anomalies and blocks those that appear dubious.
Unfortunately, I found that the rules heavily slowed down a website with which I tried them so I am going have to wait until that is moved to a faster system before I really can give them a go. This can be a problem with security tools as I also found with adding a modsec jail to a Fail2Ban instance. As it happens, both sets of observations were made using the GTmetrix tool so it seems that there is a trade off between security and speed that needs to be assessed before adding anything to block unwanted web visitors.
My response to changes in the technology stack used in clinical research is to develop some familiarity with programming and scripting platforms that complement and compete with SAS, a system with which I have been programming since 2000. One of these has been R but Python is another that has taken up my attention and I now also have Julia in my sights as well. There may be others to assess in the fullness of time.
While I first started to explore the Data Science world in the autumn of 2017, it was in the autumn of 2019 that I began to complete LinkedIn training courses on the subject. Good though they were, I find that I need to actually use a tool in order to better understand it. At that time, I did get to hear about Python packages like Pandas, NumPy, SciPy, Scikit-learn, Matplotlib, Seaborn and Beautiful Soup though it took until of spring of this year for me to start gaining some hands-on experience with using any of these.
During the summer of 2020, I attended a BCS webinar on the CodeGrades initiative, a programming mentoring scheme inspired by the way classical musicianship is assessed. In fact, one of the main progenitors is a trained classical musician and teacher of classical music who turned to Python programming when starting a family so as to have a more stable income. The approach is that a student selects a project and works their way through it with mentoring and periodic assessments carried out in a gentle and discursive manner. Of course, the project has to be engaging for the learning experience to stay the course and that point came through in the webinar.
That is one lesson that resonates with me with subjects as diverse as web server performance and the ongoing pandemic pandemic supplying data and there are other sources of public data to examine as well before looking through my own personal archive gathered over the decades. Some subjects are uplifting while others are more foreboding but the key thing is that they sustain interest and offer opportunities for new learning. Without being able to dream up new things to try, my knowledge of R and Python would not be as extensive as it is and I hope that it will help with learning Julia too.
In the main, my own learning has been a solo effort with consultation of documentation along with web searches that have brought me to the likes of Real Python, Stack Abuse, Data Viz with Python and R and others for longer tutorials as well as threads on Stack Overflow. Usually, the web searching begins when I need a steer on a particular or a way to resolve a particular error or warning message but books always are worth reading even if that is the slower route. Those from the Dummies series or from O’Reilly have proved must useful so far but I do need to read them more completely than I already have; it is all too tempting to go with the try the “programming and search for solutions as you go” approach instead.
To get going, many choose the Anaconda distribution to get Jupyter notebook functionality but I prefer a more traditional editor so Spyder has been my tool of choice for Python programming and there are others like PyCharm as well. Spyder itself is written in Python so it can be installed using pip from PyPi like other Python packages. It has other dependencies like Pylint for code management activities but these get installed behind the scenes.
The packages that I first met in 2019 may be the mainstays for doing data science but I have discovered others since then. It also seems that there is porosity between the worlds of R an Python so you get some Python packages aping R packages and R has the Reticulate package for executing Python code. There are Python counterparts to such Tidyverse stables as dply and ggplot2 in the form of Siuba and Plotnine, respectively. The syntax of these packages are not direct copies of what is executed in R but they are close enough for there to be enough familiarity for added user friendliness compared to Pandas or Matplotlib. The interoperability does not stop there for there is SQLAlchemy for connecting to MySQL and other databases (PyMySQL is needed as well) and there also is SASPy for interacting with SAS Viya.
Pyhton may not have the speed of Julia but there are plenty of packages for working with larger workloads. Of these, Dask, Modin and RAPIDS all have there uses for dealing with data volumes that make Pandas code crawl. As if to prove that there are plenty of libraries for various forms of data analytics, data science, artificial intelligence and machine learning, there also are the likes of Keras, TensorFlow and NetworkX. These are just a selection of what is available and there is no need not to check out more. It may be tempting to stick with the most popular packages all the time, especially when they do so much, but it never hurst to keep an open mind either.
Recently, a client of mine updated one of their systems from SAS 9.4 M5 to SAS 9.4 M7. In spite of performing due diligence regarding changes between the maintenance release, a change in behaviour of the SYSODSESCAPECHAR automatic macro variable surprised them. The macro variable captures the assignment of the ODS escape character used to prefix RTF codes for page numbering and other things. That setting is made using an ODS ESCAPECHAR statement like the following:
In the M5 release, the tilde character in this example was output by the automatic macro variable but that changed in the M7 release to 7E, the hexadecimal code for the same and this tripped up one of their validated macro programs used in output production. The adopted solution was to use the escape sequence (*ESC*) that gave the same outcome that was there before the change. That was less verbose than alternative code changing the hexadecimal code into the expected ASCII character that follows.
The above supplies a hexadecimal code to the BYTE function for correct rendering with the SYMPUT routine assigning the resulting value to a macro variable named new. Just using the escape sequence is far more succinct though there is now an added validation need once user pilot testing has completed. In my line of business, the updating of code is the quickest part of many such changes; documentation and testing always take longer.
During the past week, I rebooted my system only to find that a number of things no longer worked and my Pi-hole DNS server was among them. Having exhausted other possibilities by testing out things on another machine, I did a status check when I spotted a line like the following in my system logs and went investigating further:
cron: (root) INSECURE MODE (mode 0600 expected) (crontabs/root)
It turned out to be more significant than I had expected because this was why every CRON job was failing and that included the network set up needed by Pi-hole; a script is executed using the @reboot directive to accomplish this and I got Pi-hole working again by manually executing it. The evening before, I did make some changes to file permissions under /var/www but I was not expecting it to affect other parts of /var though that may have something to do with some forgotten heavy-handedness. The cure was to issue a command like the following for execution in a terminal session:
sudo chmod -R 600 /var/spool/cron/crontabs/
Then, CRON itself needed starting since it had not not running at all and executing this command did the needful without restarting the system:
sudo systemctl start cron
That outcome was proved by executing the following command to issue some terminal output that include the welcome text “active (running)” highlighted in green:
sudo systemctl status cron
There was newly updated output from a frequently executing job that checked on web servers for me but this was added confirmation. It was a simple solution to a perplexing situation that led up all sorts of blind alleys before I alighted on the right solution to the problem.
Recently, I had someone ask me how to create PNG files in SAS using ODS Graphics so I sought out the answer for them. Normally, the suggestion would have been to create RTF or PDF files instead but there was a specific need so a different approach was taken. Adding the something like following lines before an SGPLOT, SGPANEL or SGRENDER procedure does the needful:
ods listing gpath='E:\';
ods graphics / imagename="test" imagefmt=png;
Here, the ODS LISTING statement declares the destination for the desired graphics file while the ODS GRAPHICS statement defines the file name and type. In the above example, the file test.png would be created in the root of the E drive of a Windows machine. However, this also works with Linux or UNIX directory paths.
Having made plenty of use of grep on the Linux/UNIX command and findstr on the legacy Windows command line, I wondered if PowerShell could be used to search the contents of files for a text string. Usefully, this turns out to be the case but I found that the native functionality does not use what I have used before. The form of the command is given below:
Select-String -Path <filename search expression> -Pattern "<search expression>" > <output file>
While you can have the output appear on screen, it always seems easier to send it to a file for subsequent and that is what I am doing above. The input to the -Path switch can be a filename or a wildcard expression while that to the -Pattern can be a text string enclosed in quotes or a regular expression. It works well once you know what to do so here is an example:
Select-String -Path *.sas -Pattern "proc report" > c:\temp\search.txt
The search.txt file then includes both the file information and the text that has been found for sake of checking that you have what you want. What you do next is up to you.
One of the benefits of running Linux is the availability of virtual desktops and installing VirtuaWin was the only way to get the same functionality in Windows before the launch of Windows 10. For reasons known to Microsoft, they decided against the same sort of implementation as seen in Linux or UNIX. Instead, they put the virtual desktop functionality a click away and rather hides it from most users unless they know what clicking on the Task View button allows. The approach also made switching between desktops slower with a mouse. However, there are keyboard shortcuts that addresses this once multiple virtual desktops exist.
Using WIN+CTRL+LEFT or WIN+CTRL+RIGHT does this easily once you have mastered the action. Depending on your keyboard setup, WIN is the Windows, Super or Command key while CTRL is the Control key. Then, LEFT is the left arrow key and RIGHT is the right arrow key. For machines with smaller screens where multi-tasking causes clutter, virtual desktops are a godsend for organising how you work and having quick key combinations for switching between them adds to their utility.
BASH is a command-line interpretor that is commonly used by Linux and UNIX operating systems. Chances are that you will find find yourself in a BASH session if you start up a terminal emulator in many of these though there are others like KSH and SSH too.
BASH comes with its own configuration files and one of these is located in your own home directory, .bashrc. Among other things, it can become a place to store command shortcuts or aliases. Here is an example:
alias us=’sudo apt-get update && sudo apt-get upgrade’
Such a definition needs there to be no spaces around the equals sign and the actual command to be declared in single quotes. Doing anything other than this will not work as I have found. Also, there are times when you want to update or add one of these and use it without shutting down a terminal emulator and restarting it.
To reload the .bashrc file to use the updates contained in there, one of the following commands can be issued:
Both will read the file and execute its contents so you get those updates made available so you can continue what you are doing. There appears to be a tendency for this kind of thing in the world of Linux and UNIX because it also applies to remounting drives after a change to /etc/fstab and restarting system services like Apache, MySQL or Nginx. The command for the former is below:
sudo mount -a
Often, the means for applying the sorts of in-situ changes that you make are simple ones too and anything that avoids system reboots has to be good since you have less work interruptions.
In the world of UNIX and Linux, symbolic links are shortcuts but they do not work like normal Windows shortcuts because you do not jump from one location to another with the file manager’s address bar changing what it shows. Instead, it is as if you see the contents of the directory at another quicker to access location in the file system and the same sort of thinking applies to files too. In some ways, it is like giving files and directories alternative aliases. There are soft links that point to the name of a given directory or file and hard links that point to actual files or directories.
For a long time, I was under the mistaken impression that such things did not exist in Windows until I came across the mklink command, which came with the launch of Windows Vista at the start of 2007. While the then new feature may not be an obvious to most of us, it does does that Windows did incorporate a feature from UNIX and Linux well before the advent of virtual desktops on Windows 10.
By default, the aforementioned command sets up symbolic links to files and the /D switch allows the same to be done for directories too. The /H switch makes a hard link instead of a soft link so we get much of the functionality of the ln command in UNIX and Linux. Here is an example that creates a soft symbolic link for a directory:
mklink /D shortcut target_directory
Above, shortcut is the name of the symbolic link file and target_directory is the destination to which it links. In my experience, it works best for destinations beyond your home folder and, from what I have read, hard links may not be possible across different disks either.