UNIX | Technology Tales

Picking out a word from a string by its position using BASH scripting

28th March 2023

My wanting to execute one command using the text output of another recently got me wondering about picking out a block of characters using its position in a space-delimited list. All this needed to be done from the Linux command line or in a shell script. The output text took a form like the following:

text1 text2 text3 text4

What I wanted in my case was something like the third word above. The solution was to use the cut command with the -d (for delimiter) and -f (for field number) switches. The following yields text3 as the output:

echo "text1 text2 text3 text4" | cut -d " " -f 3

Here the delimiter is the space character but it can be anything that is relevant for the string in question. Then, the “3” picks out the required block of text. For this to work, the text needs to be organised consistently and for the delimiters never to be duplicated, though there is a way of dealing with the latter as well.

Moves to Hugo

30th November 2022

What amazes me is how things can become more complicated over time. As long as you knew HTML, CSS and JavaScript, building a website was not as onerous as long as web browsers played ball with it. Since then, things have got easier to use but more complex at the same time. One example is WordPress: in the early days, themes were much simpler than they are now. The web also has got more insecure over time, and that adds to complexity as well. It sometimes feels as if there is a choice to make between ease of use and simplicity.

It is against that background that I reassessed the technology that I was using on my public transport and Irish history websites. The former used WordPress, while the latter used Drupal. The irony was that the simpler website was using the more complex platform, so the act of going simpler probably was not before time. Alternatives to WordPress were being surveyed for the first of the pair, but none had quite the flexibility, pervasiveness and ease of use that WordPress offers.

There is another approach that has been gaining notice recently. One part of this is the use of Markdown for web publishing. This is a simple and distraction-free plain text format that can be transformed into something more readable. It sees usage in blogs hosted on GitHub, but also facilitates the generation of static websites. The clutter is absent for those who have no need of the Gutenberg Editor on WordPress.

With the content written in Markdown, it can be fed to a static website generator like Hugo. Using defined templates and fixed assets like CSS together with images and other static files, it can slot the content into HTML files very speedily since it is written in the Go programming language. Once you get acclimatised, there are no folder structures that cannot be used, so you get full flexibility in how you build out your website. Sitemaps and RSS feeds can be built at the same time, both using the same input as the HTML files.

In a nutshell, it automates what once needed manual effort used a code editor or a visual web page editor. The use of HTML snippets and layouts means that there is no necessity for hand-coding content, like there was at the start of the web. It also helps that Bootstrap can be built in using Node, so that gives a basis for any styling. Then, SCSS can take care of things, giving even more automation.

Given that there is no database involved in any of this, the required information has to be stored somewhere, and neither the Markdown content nor the layout files contain all that is needed. The main site configuration is defined in a single TOML file, and you can have a single one of these for every publishing destination; I have development and production servers, which makes this a very handy feature. Otherwise, every Markdown file needs a YAML header where titles, template references, publishing status and other similar information gets defined. The layouts then are linked to their components, and control logic and other advanced functionality can be added too.

Because static files are being created, it does mean that site searching and commenting or contact pages cannot work they would on a dynamic web platform. Often, external services are plugged in using JavaScript. One that I use for contact forms is Getform.io. Then, Zapier has had its uses in using the RSS feed to tweet site updates on Twitter when new content gets added. Though I made different choices, Disqus can be used for comments and Algolia for site searching. Generally, though, you can find yourself needing to pay, particularly if you need to remove advertising or gain advanced features.

Some comments service providers offer open source self-hosted options, but I found these difficult to set up and ended up not offering commenting at all. That was after I tried out Cactus Comments only to find that it was not discriminating between pages, so it showed the same comments everywhere. There are numerous alternatives like Remark42, Hyvor Talk, Commento, FastComments, Utterances, Isso, Mouthful, Muut and HyperComments but trying them all out was too time-consuming for what commenting was worth to me. It also explains why some static websites even send readers to Twitter if they have something to say, though I have not followed this way of working.

For searching, I added a JavaScript/JSON self-hosted component to the transport website, and it works well. However, it adds to the size of what a browser needs to download. That is not a major issue for desktop browsers, but the situation with mobile browsers is such that it has a sizeable effect. Testing with PageSpeed and Lighthouse highlighted this, even if I left things as they are. The solution works well in any case.

One thing that I have yet to work out is how to edit or add content while away from home. Editing files using an SSH connection is as much a possibility as setting up a Hugo publishing setup on a laptop. After that, there is the question of using a tablet or phone, since content management systems make everything web based. These are points that I have yet to explore.

As is natural with a code-based solution, there is a learning curve with Hugo. Reading a book provided some orientation, and looking on the web resolved many conundrums. There is good documentation on the project website, while forum discussions turn up on many a web search. Following any research, there was next to nothing that could not be done in some way.

Migration of content takes some forethought and took quite a bit of time, though there was an opportunity to carry some housekeeping as well. The history website was small, so copying and pasting sufficed. For the transport website, I used Python to convert what was on the database into Markdown files before refining the result. That provided some automation, but left a lot of work to be done afterwards.

The results were satisfactory, and I like the associated simplicity and efficiency. That Hugo works so fast means that it can handle large websites, so it is scalable. The new Markdown method for content production is not problematical so far apart from the need to make it more portable, and it helps that I found a setup that works for me. This also avoids any potential dealbreakers that continued development of publishing platforms like WordPress or Drupal could bring. For the former, I hope to remain with the Classic Editor indefinitely, but now have another option in case things go too far.

A desktop Markdown editing environment

8th November 2022

Earlier this year, I changed over two websites from dynamic versions using content management systems to static ones by using Hugo to build them from Markdown files. That meant that I needed to look at the editing of MarkDown even if it is a fairly simple file format. For one thing, Grammarly can be incorporated into WordPress so I did not want to lose something like that.

The latter point meant that I was steered away from plain text editors. Otherwise, there are online ones like StackEdit and Dillinger but the Firefox Grammarly plugin only appears to work on the first of these, and even then only partially in my experience. Dillinger does offer connections to online file storage providers like Google, Dropbox and OneDrive but I wanted to store files on my desktop for upload to a web server. It also works with Github but I prefer to use another web hosting provider.

There are various specialised MarkDown editors for desktop usage like Typora, ReText, Formiko or Ghostwriter but I chose none of these. My actual choice may surprise many: it was Visual Studio Code. The availability of a Grammarly plug-in was what swayed it for me even if it did need to be switched on for MarkDown files. In many ways, it does work as smoothly as elsewhere because it gets fooled by links and other code-like pieces of text. Also, having the added ability to add words to a custom dictionary would be ideal. Some rule overriding is available but I am not sure that everything is covered even if the list of options is lengthy. Some time is needed to inspect all of them before I proceed any further. Thus far, things are working well enough for me.

Using inventory files with Ansible

28th October 2022

This is the second post on Ansible following my main system’s upgrade to Linux Mint 21. Then, I manually ran some Ansible playbooks only to spot messages that I had not noticed before. Here, I discuss two messages issued because of an issue with an inventory file, which is where one defines lists of servers against which playbooks are executed. The default is called hosts and is located at /etc/ansible but the system upgrade had renamed the existing one so Ansible could not find it. The solution was to take a copy and put somewhere safer. Then, I needed to add the location of the new file to the affected ansible-playbook commands using the following construct:

ansible-playbook [playbook path] -i [inventory file path]

Before I did this, I was seeing messages including the text “Could not match supplied host pattern” or others with the following text:

[WARNING]: No inventory was parsed, only implicit localhost is available

[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'

The cause was the same in each case and attending to the inventory file got rid of the unwanted messages. The new file also should not be affected by system upgrades in the future.

Removing a Julia package

5th October 2022

While I have been programming with SAS for a few decades and it remains a lynchpin in the world of clinical development in the pharmaceutical industry, other technologies like R and Python are gaining a foothold. Two years ago, I started to look at those languages with personal projects being a great way of facilitating this. In addition, I got to hear of Julia and got to try that too. That journey continues since I have put it into use for importing and backing up photos, and there are other possible uses too.

Recently, I updated Julia to version 1.8.2 but ran into a problem with the DataArrays package that I had installed so I decided to remove it since it was added during experimentation. The Pkg package that is used for package management is documented but I had not gotten to that so some web searching ensued. It turns out that there are two ways of doing this. One uses the REPL: after pressing the ] key, the following command gets issued:

rm DataArrays

When all is done, pressing the delete or backspace keys returns things to normal. This also can be done in a script as well as the REPL and the following line works in both instances:

using Pkg; Pkg.rm("DataArrays")

The semicolon is used to separate two commands issued in the same line but they can be on different lines or issued separately just as well. Naturally, DataArrays is just an example here so you just replace that with the name of whatever other package you need to remove. Since we can get carried away when downloading packages, there are times when a clean-up is needed to remove redundant packages so knowing how to remove any clutter is invaluable.

Accessing Julia REPL command history

4th October 2022

In the BASH shell used on Linux and UNIX, the history command calls up a list of recent commands used and has many uses. There is a .bash_history file in the root of the user folder that logs and provides all this information so there are times when you need to exclude some commands from there but that is another story.

The Julia REPL environment works similarly to many operating system command line interfaces, so I wondered if there was a way to recall or refer to the history of commands issued. So far, I have not come across an equivalent to the BASH history command for the REPL itself but there the command history is retained in a file like .bash_history. The location varies on different operating systems though. On Linux, it is ~/.julia/logs/repl_history.jl while it is %USERPROFILE%\.julia\logs\repl_history.jl on Windows. While I tend to use scripts that I have written in VSCode rather than entering pieces of code in the REPL, the history retains its uses and I am sharing it here for others. In the past, the location changed but these are the ones with Julia 1.8.2, the version that I have at the time of writing.

Changing the Ansible Vault editor from Vi to Nano

15th August 2022

Recently, I got to experimenting with Ansible after reading about the orchestration tool in a copy of Admin magazine. It came in handy for updating a few web servers that I have as well as updating my main Linux workstation. For the former, automated entry of SSH passwords sufficed but the same did not apply for sudo usage on my local machine. This meant that I needed to use Ansible Vault to store the administrator password and doing so opened up a file in the Vi editor. since I am not familiar with Vi and wanted to get things sorted quickly, I fancied using something more user-friendly like Nano.

Doing this meant adding the following line to .bashrc:

export EDITOR=nano

Saving and closing the file followed by reloading the session set me up for what was needed.

Automated entry of SSH passwords

17th February 2022

One thing that is very handy for shell scripting is to have automated entry of passwords for logging into other servers. This can involve using plain text files, which is not always ideal so it was good to find an alternative. The first step is to use the keygen tool that comes with SSH. The command is given below and the -t switch specifies the type of key to be made, RSA in this case. There is the option to add a passphrase but I decided against this for sake of convenience and you do need to assess your security needs before embarking on such a course of action.

ssh-keygen -t rsa

The next step is to use the ssh-copy-id command to generate the keys for a set of login credentials. For this, it is better to use a user account with restricted access to keep as much server security as you can. Otherwise, the process is as simple as executing a command like the following and entering the password at the prompt for doing so.

ssh-copy-id [user ID]@[server address]

Getting this set up has been useful for running a file upload script to keep a web server synchronised and it is better to have the credentials encrypted rather than kept in a plain text file.

A quick look at the 7G Firewall

17th October 2021

There is a simple principal with the 7G Firewall from Perishable press: it is a set of mod_rewrite rules for the Apache web server that can be added to a .htaccess file and there also is a version for the Nginx web server as well. These check query strings, request Uniform Resource Identifiers (URI’s), user agents, remote hosts, HTTP referrers and request methods for any anomalies and blocks those that appear dubious.

Unfortunately, I found that the rules heavily slowed down a website with which I tried them so I am going have to wait until that is moved to a faster system before I really can give them a go. This can be a problem with security tools as I also found with adding a modsec jail to a Fail2Ban instance. As it happens, both sets of observations were made using the GTmetrix tool so it seems that there is a trade off between security and speed that needs to be assessed before adding anything to block unwanted web visitors.

Useful Python packages for working with data

14th October 2021

My response to changes in the technology stack used in clinical research is to develop some familiarity with programming and scripting platforms that complement and compete with SAS, a system with which I have been programming since 2000. One of these has been R but Python is another that has taken up my attention and I now also have Julia in my sights as well. There may be others to assess in the fullness of time.

While I first started to explore the Data Science world in the autumn of 2017, it was in the autumn of 2019 that I began to complete LinkedIn training courses on the subject. Good though they were, I find that I need to actually use a tool in order to better understand it. At that time, I did get to hear about Python packages like Pandas, NumPy, SciPy, Scikit-learn, Matplotlib, Seaborn and Beautiful Soup though it took until of spring of this year for me to start gaining some hands-on experience with using any of these.

During the summer of 2020, I attended a BCS webinar on the CodeGrades initiative, a programming mentoring scheme inspired by the way classical musicianship is assessed. In fact, one of the main progenitors is a trained classical musician and teacher of classical music who turned to Python programming when starting a family so as to have a more stable income. The approach is that a student selects a project and works their way through it with mentoring and periodic assessments carried out in a gentle and discursive manner. Of course, the project has to be engaging for the learning experience to stay the course and that point came through in the webinar.

That is one lesson that resonates with me with subjects as diverse as web server performance and the ongoing pandemic pandemic supplying data and there are other sources of public data to examine as well before looking through my own personal archive gathered over the decades. Some subjects are uplifting while others are more foreboding but the key thing is that they sustain interest and offer opportunities for new learning. Without being able to dream up new things to try, my knowledge of R and Python would not be as extensive as it is and I hope that it will help with learning Julia too.

In the main, my own learning has been a solo effort with consultation of documentation along with web searches that have brought me to the likes of Real Python, Stack Abuse, Data Viz with Python and R and others for longer tutorials as well as threads on Stack Overflow. Usually, the web searching begins when I need a steer on a particular or a way to resolve a particular error or warning message but books always are worth reading even if that is the slower route. Those from the Dummies series or from O’Reilly have proved must useful so far but I do need to read them more completely than I already have; it is all too tempting to go with the try the “programming and search for solutions as you go” approach instead.

To get going, many choose the Anaconda distribution to get Jupyter notebook functionality but I prefer a more traditional editor so Spyder has been my tool of choice for Python programming and there are others like PyCharm as well. Spyder itself is written in Python so it can be installed using pip from PyPi like other Python packages. It has other dependencies like Pylint for code management activities but these get installed behind the scenes.

The packages that I first met in 2019 may be the mainstays for doing data science but I have discovered others since then. It also seems that there is porosity between the worlds of R an Python so you get some Python packages aping R packages and R has the Reticulate package for executing Python code. There are Python counterparts to such Tidyverse stables as dply and ggplot2 in the form of Siuba and Plotnine, respectively. The syntax of these packages are not direct copies of what is executed in R but they are close enough for there to be enough familiarity for added user friendliness compared to Pandas or Matplotlib. The interoperability does not stop there for there is SQLAlchemy for connecting to MySQL and other databases (PyMySQL is needed as well) and there also is SASPy for interacting with SAS Viya.

Pyhton may not have the speed of Julia but there are plenty of packages for working with larger workloads. Of these, Dask, Modin and RAPIDS all have there uses for dealing with data volumes that make Pandas code crawl. As if to prove that there are plenty of libraries for various forms of data analytics, data science, artificial intelligence and machine learning, there also are the likes of Keras, TensorFlow and NetworkX. These are just a selection of what is available and there is no need not to check out more. It may be tempting to stick with the most popular packages all the time, especially when they do so much, but it never hurst to keep an open mind either.

« Older Entries « » Newer Entries »

Topics Discussed

Translate