TOPIC: SCRIPTING LANGUAGES
Removing duplicate characters from strings using BASH scripting
30th March 2023Recently, I wanted to extract some text from the Linux command by word number only for multiple spaces to make things less predictable. The solution was to remove the duplicate spaces. This can be done using sed
, but you add the complexity of regular expressions if you opt for that solution. Instead, the tr
command offers a neater approach. For removing duplicate spaces, the command takes the following form:
echo "test test" | tr -s " "
Since I was piping some text to the command, that is what I have above. The tr
command is intended to replace or delete characters, and the -s switch is a shorthand for --squeeze-repeats. The actual character to be deduplicated is passed in quotes at the end; here, it is a space, but it could be anything that is duplicated. The resulting text in this example becomes:
test test
After the processing, there is now only one space separating the two words, which is the solution that I sought. It certainly cut out any variability that I was encountering in my usage.
Rendering Markdown into HTML using PHP
3rd December 2022One of the good things about using virtual private servers for hosting websites instead of shared hosting or using a web application service like WordPress.com or Tumblr is that you get added control and flexibility. There was a time when HTML, CSS and client-side scripting were all that was available from the shared hosting providers that I was using back then. Then, static websites were my lot until it became possible to use Perl server side scripting. PHP predominates now, but Python or Ruby cannot be discounted either.
Being able to install whatever you want is a bonus as well, though it means that you also are responsible for the security of the containers that you use. There will be infrastructure security, but that of your own machine will be your own concern. Added power always means added responsibility, as many might say.
The reason that these thought emerge here is that getting PHP to render Markdown as HTML needs the installation of Composer. Without that, you cannot use the CommonMark package to do the required back-work. All the command that you see here will work on Ubuntu 22.04. First, you need to download Composer and executing the following command will accomplish this:
curl https://getcomposer.org/installer -o /tmp/composer-setup.php
Before the installation, it does no harm to ensure that all is well with the script before proceeding. That means that capturing the signature for the script using the following command is wise:
HASH=`curl https://composer.github.io/installer.sig`
Once you have the script signature, then you can check its integrity using this command:
php -r "if (hash_file('SHA384', '/tmp/composer-setup.php') === '$HASH') { echo 'Installer verified'; } else { echo 'Installer corrupt'; unlink('composer-setup.php'); } echo PHP_EOL;"
The result that you want is "Installer verified". If not, you have some investigating to do. Otherwise, just execute the installation command:
sudo php /tmp/composer-setup.php --install-dir=/usr/local/bin --filename=composer
With Composer installed, the next step is to run the following command in the area where your web server expects files to be stored. That is important when calling the package in a PHP script.
composer require league/commonmark
Then, you can use it in a PHP script like so:
define("ROOT_LOC",$_SERVER['DOCUMENT_ROOT']);
include ROOT_LOC . '/vendor/autoload.php';
use League\CommonMark\CommonMarkConverter;
$converter = new CommonMarkConverter();
echo $converter->convertToHtml(file_get_contents(ROOT_LOC . '<location of markdown file>));
The first line finds the absolute location of your web server file directory before using it when defining the locations of the autoload script and the required markdown file. The third line then calls in the CommonMark package, while the fourth sets up a new object for the desired transformation. The last line converts the input to HTML and outputs the result.
If you need to render the output of more than one Markdown file, then repeating the last line from the preceding block with a different file location is all you need to do. The CommonMark object persists and can be used like a variable without needing the reinitialisation to be repeated every time.
The idea of building a website using PHP to render Markdown has come to mind, but I will leave it at custom web pages for now. If an opportunity comes, then I can examine the idea again. Before, I had to edit HTML, but Markdown is friendlier to edit, so that is a small advance for now.
Resolving a clash between Homebrew and Python
22nd November 2022For reasons that I cannot recall now, I installed the Hugo static website generator on my Linux system and web servers using Homebrew. The only reason that I suggest is that it might have been a way to get the latest version at the time because Linux Mint only does major changes like that every two years, keeping it in line with long-term support editions of Ubuntu.
When Homebrew was installed, it changed the lookup path for command line executables by adding the following line to my .bashrc
file:
eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)"
This executed the following lines:
export HOMEBREW_PREFIX="/home/linuxbrew/.linuxbrew";
export HOMEBREW_CELLAR="/home/linuxbrew/.linuxbrew/Cellar";
export HOMEBREW_REPOSITORY="/home/linuxbrew/.linuxbrew/Homebrew";
export PATH="/home/linuxbrew/.linuxbrew/bin:/home/linuxbrew/.linuxbrew/sbin${PATH+:$PATH}";
export MANPATH="/home/linuxbrew/.linuxbrew/share/man${MANPATH+:$MANPATH}:";
export INFOPATH="/home/linuxbrew/.linuxbrew/share/info:${INFOPATH:-}";
While the result suits Homebrew, it changed the setup of Python and its packages on my system. Eventually, this had undesirable consequences, like messing up how Spyder started, so I wanted to change this. There are other things that I have automated using Python and these were not working either.
One way that I have seen suggested is to execute the following command, but I cannot vouch for this:
brew unlink python
What I did was to comment out the offending line in .bashrc
and replace it with the following:
export PATH="$PATH:/home/linuxbrew/.linuxbrew/bin:/home/linuxbrew/.linuxbrew/sbin"
export HOMEBREW_PREFIX="/home/linuxbrew/.linuxbrew";
export HOMEBREW_CELLAR="/home/linuxbrew/.linuxbrew/Cellar";
export HOMEBREW_REPOSITORY="/home/linuxbrew/.linuxbrew/Homebrew";
export MANPATH="/home/linuxbrew/.linuxbrew/share/man${MANPATH+:$MANPATH}:";
export INFOPATH="${INFOPATH:-}/home/linuxbrew/.linuxbrew/share/info";
The first command adds Homebrew paths to the end of the PATH variable rather than the beginning, which was the previous arrangement. This ensures system folders are searched for executable files before Homebrew folders. It also means Python packages load from my user area instead of the Homebrew location, which happened under Homebrew's default configuration. When working with Python packages, remember not to install one version at the system level and another in your user area, as this creates conflicts.
So far, the result of the Homebrew changes is not unsatisfactory, and I will watch for any rough edges that need addressing. If something comes up, then I will set things up in another way.
A look at the Julia programming language
19th November 2022Several open-source computing languages get mentioned when talking about working with data. Among these are R and Python, but there are others; Julia is another one of these. It took a while before I got to check out Julia because I felt the need to get acquainted with R and Python beforehand. There are others like Lua to investigate too, but that can wait for now.
With the way that R is making an incursion into clinical data reporting analysis following the passage of decades when SAS was predominant, my explorations of Julia are inspired by a certain contrariness on my part. Alongside some small personal projects, there has been some reading in (digital) book form and online. Concerning the latter of these, there are useful tutorials like Introduction to Data Science: Learn Julia Programming, Maths & Data Science from Scratch or Julia Programming: a Hands-on Tutorial. Like what happens with R, there are online versions of published books available free of charge, and they include Julia Data Science and Interactive Visualization and Plotting with Julia. Video learning can help too and Jane Herriman has recorded and shared useful beginner's guides on YouTube that start with the basics before heading onto more advanced subjects like multiple dispatch, broadcasting and metaprogramming.
This piece of learning has been made of simple self-inspired puzzles before moving on to anything more complex. That differs from my dalliance with R and Python, where I ventured into complexity first, not least because of testing them out with public COVID data. Eventually, I got around to doing that with Julia too, though my interest was beginning to wane by then, and Julia's abilities for creating multipage PDF files were such that the PDF Toolkit was needed to help with this. Along the way, I have made use of such packages as CSV.jl, DataFrames.jl, DataFramesMeta, Plots, Gadfly.jl, XLSX.jl and JSON3.jl, among others. After that, there is PrettyTables.jl to try out, and anyone can look at the Beautiful Makie website to see what Makie can do. There are plenty of other packages creating graphs, such as SpatialGraphs.jl, PGFPlotsX and GRUtils.jl. For formatting numbers, options include Format.jl and Humanize.jl.
So far, my primary usage has been with personal financial data together with automated processing and backup of photo files. The photo file processing has taken advantage of the ability to compile Julia scripts for added speed because just-in-time compilation always means there is a lag before the real work begins.
VS Code is my chosen editor for working with Julia scripts, since it has a plugin for the language. That adds the REPL, syntax highlighting, execution and data frame viewing capabilities that once were added to the now defunct Atom editor by its own plugin. While it would be nice to have a keyboard shortcut for script execution, the whole thing works well and is regularly updated.
Naturally, there have been a load of queries as I have gone along and the Julia Documentation has been consulted as well as Julia Discourse and Stack Overflow. The latter pair have become regular landing spots on many a Google search. One example followed a glitch that I encountered after a Julia upgrade when I asked a question about this and was directed to the XLSX.jl Migration Guides where I got the information that I needed to fix my code for it to run properly.
There is more learning to do as I continue to use Julia for various things. Once compiled, it does run fast like it has been promised. The syntax paradigm is akin to R and Python, but there are Julia-specific features too. If you have used the others, the learning curve is lessened but not eliminated completely. This is not an object-oriented language as such, but its functional nature makes it familiar enough for getting going with it. In short, the project has come a long way since it started more than ten years ago. There is much for the scientific programmer, but only time will tell if it usurped its older competitors. For now, I will remain interested in it.
Removing a Julia package
5th October 2022While I have been programming with SAS for a few decades, and it remains a linchpin in the world of clinical development in the pharmaceutical industry, other technologies like R and Python are gaining a foothold. Two years ago, I started to look at those languages with personal projects being a great way of facilitating this. In addition, I got to hear of Julia and got to try that too. That journey continues since I have put it into use for importing and backing up photos, and there are other possible uses too.
Recently, I updated Julia to version 1.8.2 but ran into a problem with the DataArrays
package that I had installed, so I decided to remove it since it was added during experimentation. Though the Pkg
package that is used for package management is documented, I had not got to that, which meant that some web searching ensued. It turns out that there are two ways of doing this. One uses the REPL: after pressing the ]
key, the following command gets issued:
rm DataArrays
When all is done, pressing the delete or backspace keys returns things to normal. This also can be done in a script as well as the REPL, and the following line works in both instances:
using Pkg; Pkg.rm("DataArrays")
While the semicolon is used to separate two commands issued on the same line, they can be on different lines or issued separately just as well. Naturally, DataArrays
is just an example here; you just replace that with the name of whatever other package you need to remove. Since we can get carried away when downloading packages, there are times when a clean-up is needed to remove redundant packages, so knowing how to remove any clutter is invaluable.
Accessing Julia REPL command history
4th October 2022In the BASH shell used on Linux and UNIX, the history command calls up a list of recent commands used and has many uses. There is a .bash_history file in the root of the user folder that logs and provides all this information, so there are times when you need to exclude some commands from there, but that is another story.
The Julia REPL environment works similarly to many operating system command line interfaces, so I wondered if there was a way to recall or refer to the history of commands issued. So far, I have not come across an equivalent to the BASH history command for the REPL itself, but there the command history is retained in a file like .bash_history. The location varies on different operating systems, though. On Linux, it is ~/.julia/logs/repl_history.jl
while it is %USERPROFILE%\.julia\logs\repl_history.jl
on Windows. While I tend to use scripts that I have written in VSCode rather than entering pieces of code in the REPL, the history retains its uses. Thus, I am sharing it here for others. In the past, the location changed, but these are the ones with Julia 1.8.2, the version that I have at the time of writing.
Getting custom Python imports to work in Visual Studio Code
18th February 2022While I continue to use Spyder as my preferred Python code editor, I also tried out Visual Studio Code. Handily, this Integrated Development Environment also has facilities for working with R and Julia code as well as Markdown text editing and adding the required extensions is enough for these applications; it helps that there is an unofficial Grammarly extension for content creation.
My Python code development makes use of the Pylance extension, and it works a little differently from Spyder when it comes to including files using import statements. Spyder will look into the folder where the base script is located, but the default behaviour of Pylance is that it looks in the root path of your workspace. This meant that any code that ran successfully in Spyder failed in Visual Studio Code.
To solve this issue, I added the location using the python.analysis.extraPaths
setting for the workspace. I opened Settings by going to File > Preferences > Settings in the menu. I typed python.analysis.extraPaths
in the search box. This showed me the correct section. I clicked on Add Item, entered the required path, and clicked OK. This resolved the problem, and everything worked properly afterwards.
Broadening data science horizons: Useful Python packages for working with data
14th October 2021My response to changes in the technology stack used in clinical research is to develop some familiarity with programming and scripting platforms that complement and compete with SAS, a system with which I have been programming since 2000. While one of these has been R, Python is another that has taken up my attention, and I now also have Julia in my sights as well. There may be others to assess in the fullness of time.
While I first started to explore the Data Science world in the autumn of 2017, it was in the autumn of 2019 that I began to complete LinkedIn training courses on the subject. Good though they were, I find that I need to actually use a tool to better understand it. At that time, I did get to hear about Python packages like Pandas, NumPy, SciPy, Scikit-learn, Matplotlib, Seaborn and Beautiful Soup though it took until of spring of this year for me to start gaining some hands-on experience with using any of these.
During the summer of 2020, I attended a BCS webinar on the CodeGrades initiative, a programming mentoring scheme inspired by the way classical musicianship is assessed. In fact, one of the main progenitors is a trained classical musician and teacher of classical music who turned to Python programming when starting a family to have a more stable income. The approach is that a student selects a project and works their way through it, with mentoring and periodic assessments carried out in a gentle and discursive manner. Of course, the project has to be engaging for the learning experience to stay the course, and that point came through in the webinar.
That is one lesson that resonates with me with subjects as diverse as web server performance and the ongoing pandemic supplying data, and there are other sources of public data to examine as well before looking through my own personal archive gathered over the decades. Though some subjects are uplifting while others are more foreboding, the key thing is that they sustain interest and offer opportunities for new learning. Without being able to dream up new things to try, my knowledge of R and Python would not be as extensive as it is, and I hope that it will help with learning Julia too.
In the main, my own learning has been a solo effort with consultation of documentation along with web searches that have brought me to the likes of Real Python, Stack Abuse, Data Viz with Python and R and others for longer tutorials as well as threads on Stack Overflow. Usually, the web searching begins when I need a steer on a particular or a way to resolve a particular error or warning message, but books are always worth reading even if that is the slower route. While those from the Dummies series or from O'Reilly have proved must useful so far, I do need to read them more completely than I already have; it is all too tempting to go with the try the "programming and search for solutions as you go" approach instead.
To get going, many choose the Anaconda distribution to get Jupyter notebook functionality, but I prefer a more traditional editor, so Spyder has been my tool of choice for Python programming and there are others like PyCharm as well. Because Spyder itself is written in Python, it can be installed using pip from PyPi like other Python packages. It has other dependencies like Pylint for code management activities, but these get installed behind the scenes.
The packages that I first met in 2019 may be the mainstays for doing data science, but I have discovered others since then. It also seems that there is porosity between the worlds of R and Python, so you get some Python packages aping R packages and R has the Reticulate package for executing Python code. There are Python counterparts to such Tidyverse stables as dplyr and ggplot2 in the form of Siuba and Plotnine, respectively. Though the syntax of these packages are not direct copies of what is executed in R, they are close enough for there to be enough familiarity for added user-friendliness compared to Pandas or Matplotlib. The interoperability does not stop there, for there is SQLAlchemy for connecting to MySQL and other databases (PyMySQL is needed as well) and there also is SASPy for interacting with SAS Viya.
While Python may not have the speed of Julia, there are plenty of packages for working with larger workloads. Of these, Dask, Modin and RAPIDS all have their uses for dealing with data volumes that make Pandas code crawl. As if to prove that there are plenty of libraries for various forms of data analytics, data science, artificial intelligence and machine learning, there also are the likes of Keras, TensorFlow and NetworkX. These are just a selection of what is available, and there is always the possibility of checking out others. It may be tempting to stick with the most popular packages all the time, especially when they do so much, but it never hurts to keep an open mind either.
Expanding the coding toolkit: Adding R and Python in a changing landscape
10th April 2021Over the years, I have taught myself a number of computing languages, with some coming in useful for professional work while others came in handy for website development and maintenance. The collection has grown to include HTML, CSS, XML, Perl, PHP and UNIX Shell Scripting. The ongoing pandemic allowed to me add two more to the repertoire: R and Python.
My interest in these arose from my work as an information professional concerned with standardisation and automation of statistical results delivery. To date, the main focus has been on clinical study data, but ongoing changes in the life sciences sector could mean that I may need to look further afield, so having extra knowledge never hurts. Though I have been a SAS programmer for more than twenty years, its predominance in the clinical research field is not what it was, which causes me to rethink things.
As it happens, I would like to continue working with SAS since it does so much and thoughts of leaving it after me bring sadness. It also helps to know what the alternatives might be and to reject some management hopes about any newcomers, especially regarding the amount of code being produced and the quality of graphs being created. Use cases need to be assessed dispassionately, even when emotions loom behind the scenes.
Since both R and Python bring large scripting ecosystems with active communities, the attraction of their adoption makes a deal of sense. SAS is comparable in the scale of its own ecosystem, though there are considerable differences and the platform is catching up when it comes to Data Science. While the aforementioned open-source languages may have had a head start, it appears that others are not standing still either. It is a time to have wider awareness, and online conference attendance helps with that.
The breadth of what is available for any programming language more than stymies any attempt to create a truly all encompassing starting point, and I have abandoned thoughts of doing anything like that for R. Similarly, I will not even try such a thing for Python. Consequently, this means that my sharing of anything learned will be in the form of discrete postings from time to time, especially given ho easy it is to collect numerous website links for sharing.
The learning has been facilitated by ongoing pandemic restrictions, though things are opening up a little now. The pandemic also has given us public data that can be used for practice, since much can be gained from having one's own project instead of completing exercises from a book. Having an interesting data set with which to work is a must, and COVID-19 data contain a certain self-interest as well, while one remains mindful of the suffering and loss of life that has been happening since the pandemic first took hold.
Using multi-line commenting in Perl to inactivate blocks of code during testing
26th December 2019Recently, I needed to inactivate blocks of code in a Perl script while doing some testing. Since this is something that I often do in other computing languages, I sought the same in Perl. To accomplish that, I need to use the POD methodology. This meant enclosing the code as follows.
=start
<< Code to be inactivated by inclusion in a comment >>
=cut
While the =start
line could use any word after the equality sign, it seems that =cut
is required to close the multi-line comment. If this was actual programming documentation, then the comment block should include some meaningful text for use with perldoc
. However, that was not a concern here because the commenting statements would be removed afterwards anyway. It also is good practice not to leave commented code in a production script or program to avoid any later confusion.
In my case, this facility allowed me to isolate the code that I had to alter and test before putting everything back as needed. It also saved time since I did not need to individually comment out every executable line because multiple lines could be inactivated at a time.