Technology Tales

Adventures & experiences in contemporary technology

Moves to Hugo

30th November 2022

What amazes me is how things can become more complicated over time. As long as you knew HTML, CSS and JavaScript, building a website was not as onerous as long as web browsers played ball with it. Since then, things have got easier to use but more complex at the same time. One example is WordPress: in the early days, themes were much simpler than they are now. The web also has got more insecure over time, and that adds to complexity as well. It sometimes feels as if there is a choice to make between ease of use and simplicity.

It is against that background that I reassessed the technology that I was using on my public transport and Irish history websites. The former used WordPress, while the latter used Drupal. The irony was that the simpler website was using the more complex platform, so the act of going simpler probably was not before time. Alternatives to WordPress were being surveyed for the first of the pair, but none had quite the flexibility, pervasiveness and ease of use that WordPress offers.

There is another approach that has been gaining notice recently. One part of this is the use of Markdown for web publishing. This is a simple and distraction-free plain text format that can be transformed into something more readable. It sees usage in blogs hosted on GitHub, but also facilitates the generation of static websites. The clutter is absent for those who have no need of the Gutenberg Editor on WordPress.

With the content written in Markdown, it can be fed to a static website generator like Hugo. Using defined templates and fixed assets like CSS together with images and other static files, it can slot the content into HTML files very speedily since it is written in the Go programming language. Once you get acclimatised, there are no folder structures that cannot be used, so you get full flexibility in how you build out your website. Sitemaps and RSS feeds can be built at the same time, both using the same input as the HTML files.

In a nutshell, it automates what once needed manual effort used a code editor or a visual web page editor. The use of HTML snippets and layouts means that there is no necessity for hand-coding content, like there was at the start of the web. It also helps that Bootstrap can be built in using Node, so that gives a basis for any styling. Then, SCSS can take care of things, giving even more automation.

Given that there is no database involved in any of this, the required information has to be stored somewhere, and neither the Markdown content nor the layout files contain all that is needed. The main site configuration is defined in a single TOML file, and you can have a single one of these for every publishing destination; I have development and production servers, which makes this a very handy feature. Otherwise, every Markdown file needs a YAML header where titles, template references, publishing status and other similar information gets defined. The layouts then are linked to their components, and control logic and other advanced functionality can be added too.

Because static files are being created, it does mean that site searching and commenting, or contact pages cannot work like they would on a dynamic web platform. Often, external services are plugged in using JavaScript. One that I use for contact forms is Getform.io. Then, Zapier has had its uses in using the RSS feed to tweet site updates on Twitter when new content gets added. Though I made different choices, Disqus can be used for comments and Algolia for site searching. Generally, though, you can find yourself needing to pay, particularly if you need to remove advertising or gain advanced features.

Some comments service providers offer open source self-hosted options, but I found these difficult to set up and ended up not offering commenting at all. That was after I tried out Cactus Comments only to find that it was not discriminating between pages, so it showed the same comments everywhere. There are numerous alternatives like Remark42, Hyvor Talk, Commento, FastComments, Utterances, Isso, Mouthful, Muut and HyperComments but trying them all out was too time-consuming for what commenting was worth to me. It also explains why some static websites even send readers to Twitter if they have something to say, though I have not followed this way of working.

For searching, I added a JavaScript/JSON self-hosted component to the transport website, and it works well. However, it adds to the size of what a browser needs to download. That is not a major issue for desktop browsers, but the situation with mobile browsers is such that it has a sizeable effect. Testing with PageSpeed and Lighthouse highlighted this, even if I left things as they are. The solution works well in any case.

One thing that I have yet to work out is how to edit or add content while away from home. Editing files using an SSH connection is as much a possibility as setting up a Hugo publishing setup on a laptop. After that, there is the question of using a tablet or phone, since content management systems make everything web based. These are points that I have yet to explore.

As is natural with a code-based solution, there is a learning curve with Hugo. Reading a book provided some orientation, and looking on the web resolved many conundrums. There is good documentation on the project website, while forum discussions turn up on many a web search. Following any research, there was next to nothing that could not be done in some way.

Migration of content takes some forethought and took quite a bit of time, though there was an opportunity to carry some housekeeping as well. The history website was small, so copying and pasting sufficed. For the transport website, I used Python to convert what was on the database into Markdown files before refining the result. That provided some automation, but left a lot of work to be done afterwards.

The results were satisfactory, and I like the associated simplicity and efficiency. That Hugo works so fast means that it can handle large websites, so it is scalable. The new Markdown method for content production is not problematical so far apart from the need to make it more portable, and it helps that I found a setup that works for me. This also avoids any potential dealbreakers that continued development of publishing platforms like WordPress or Drupal could bring. For the former, I hope to remain with the Classic Editor indefinitely, but now have another option in case things go too far.

A quick look at the 7G Firewall

17th October 2021

There is a simple principal with the 7G Firewall from Perishable press: it is a set of mod_rewrite rules for the Apache web server that can be added to a .htaccess file and there also is a version for the Nginx web server as well. These check query strings, request Uniform Resource Identifiers (URI’s), user agents, remote hosts, HTTP referrers and request methods for any anomalies and blocks those that appear dubious.

Unfortunately, I found that the rules heavily slowed down a website with which I tried them so I am going have to wait until that is moved to a faster system before I really can give them a go. This can be a problem with security tools as I also found with adding a modsec jail to a Fail2Ban instance. As it happens, both sets of observations were made using the GTmetrix tool so it seems that there is a trade off between security and speed that needs to be assessed before adding anything to block unwanted web visitors.

Some online writing tools

15th October 2021

Every week, I get an email newsletter from Woody’s Office Watch. This was something to which I started subscribing in the 1990’s but I took a break from it for a good while for reasons that I cannot recall and returned to it only in recent years. This week’s issue featured a list of online paraphrasing tools that are part of what is offered by Quillbot, Paraphraser, Dupli Checker and Pre Post Seo. Each got their own reviews in the newsletter so I will just outline other features in this posting.

In Quillbot’s case, the toolkit includes a grammar checker, summary generator, and citation generator. In addition to the online offering, there are extensions for Microsoft Word, Google Chrome, and Google Docs. In addition to the free version, a paid subscription option is available.

In spite of the name, Paraphraser is about more than what the title purports to do. There is article rewriting, plagiarism checking, grammar checking and text summarisation. Because there is no premium version, the offering is funded by advertising and it will not work with an ad blocker enabled. The mention of plagiarism suggests a perhaps murkier side to writing that cuts both ways: one is to avoid copying other work while another is the avoidance of groundless accusations of copying.

It was appear that the main role of Dupli Checker is to avoid accusations of plagiarism by checking what you write yet there is a grammar checker as well as a paraphrasing tool on there too. When I tried it, the English that it produced looked a little convoluted and there is a lack of fluency in what is written on its website as well. Together with a free offering that is supported by ads that were not blocked by my ad blocker, there are premium subscriptions too.

In web publishing, they say that content is king so the appearance of an option using the acronym for Search Engine Optimisation in it name may not be as strange as it might as first glance. There are numerous tools here with both free and paid tiers of service. While paraphrasing and plagiarism checking get top billing in the main menu on the home page, further inspection reveals that there is a lot more to check on this site.

In writing, inspiration is a fleeting and ephemeral quantity so anything that helps with this has to be of interest. While any rewriting of initial content may appear less smooth than the starting point, any help with the creation process cannot go amiss. For that reason alone, I might be tempted to try these tools from time to time and they might assist with proof reading as well because that can be a hit and miss affair for some.

 

A new look

11th October 2021

Things have been changing on here. Much of that has been behind the scenes with a move to a new VPS for extra speed and all the upheaval that brings. It also gained me a better system for less money than the old upgrade path was costing me and everything feels more responsive as well. Extra work has gone into securing the website as well and I have learned a lot as that has progressed. New lessons were added to older, and sometimes forgotten, ones.

The more obvious change for those who have been here before is that the visual appearance has been refreshed. A new theme has been applied with a multitude of tweaks to make it feel unique and to iron out any rough edges that there may be. This remains a WordPress-based website and new theme is a variant of the Appointee subtheme of the Appointment theme. WordPress does only supports child theming but not grandchild theming so I had to make a copy of Appointee of my own so I could modify things as I see fit.

To my eyes, things do look cleaner, crisper and brighter so I hope that it feels the same to you. Like so many designs these days, the basis is the Bootstrap framework and that is no bad thing in my mind though the standardisation may be too much for some tastes. What has become challenging is that it is getter harder to find new spins on more traditional layouts with everything going for a more magazine-like appearance and summaries being shown on the front page instead of complete articles. That probably reflects how things are going for websites these days so it may be that the next refresh could be more home grown and that is a while away yet.

As the website heads towards its sixteenth year, there is bound to be continuing change. In some ways, I prefer that some things remain unchanged so I use the classic editor instead of Gutenburg because that works best for me. Block-based editing is not for me since I prefer to tinker with code anyway. Still, not all of its influences can be avoided and I have needed to figure out the new widgets interface. It did not feel that intuitive but I suppose that I will grow accustomed to it.

My interest in technology continues even if it saddens me at time and some things do not impress me; the Windows 11 taskbar is one of those so I will not be in any hurry to move away from Windows 10. Still, the pandemic has offered its own learning with virtual conferencing allowing one to lurk and learn new things. For me, this has included R, Python, Julia and DevOps among other things. That proved worthwhile during a time with many restrictions. All that could yield more content yet and some already is on the way.

As ever, it is my own direct working with technology that yields some real niche ideas that others have not covered. With so many technology blogs out there, they may be getting less and less easy to find but everyone has their own journey so I hope to encounter more of them. There remain times when doing precedes telling and that is how it is on here. It is not all about appearances since content matters as much as it ever did.

A little bit of abstraction

21st August 2021

A little bit of abstraction

Data science has remained in my awareness since 2017 though my work is more on its fringes in clinical research. In fact, I have been involved more in the standardisation and automation of more traditional data reporting than in the needs of data modelling such as data engineering or other similar disciplines. Much of this effort has meant the use of SAS, with which I have programmed since 2000 and for which I have a licence (an expensive commodity, it has to be said), but other technologies are being explored with R, Python and Julia being among them.

The change in technological scope does bring an element of excitement and new interest but there is also some sadness when tried and trusted technologies meet with newer competition and valued skills are no longer as career securing as they once were. Still, there is plenty of online training out there and I already have collected some of my thoughts on this. The learning continues and the need for repositioning is also clear.

A little bit of abstraction

A little bit of abstraction

The journey also has brought some curios to my notice. One of these is This Person Does Not Exist, a website building photos of non-existent faces using machine learning. Recently, I learned of others like it such as This Artwork Does Not Exist, This Cat Does Not Exist, This Horse Does Not Exist, and This Chemical Does Not Exist. The last of these probably should be entitled “This Molecule Does Not Exist (Yet)” since it is a fictitious molecular structure that has been created and what you get is an actual moving image that spins it around in three-dimensional space. The one with dynamically generated abstract art is the main inspiration for this piece and is of more interest to me while the other two are more explanatory though the horse website is not so successful in its execution and one can ask why we need more cat pictures.

To some, the idea of creating fake pictures may feel a little foreboding and that especially applies to photos of people and the livelihoods of any content creators. Nevertheless, these sources of imagery have their legitimate uses such as decorating websites or brochures and that is where my interest is piqued. After all, there are some subjects where pictures can be scarce so any form of decoration that enlivens an article has to have some use. Technology websites like this one can feature images too with screenshots and device photos being commonplace but they can all look like each other, hence the need for a little more variety and having pictures often increases the choice of website themes as well since so many need images to make them work or stand out. As ever, being sparing with any new innovations remains in order so that is how I approach this matter as well.

Using .htaccess to control hotlinking

10th October 2020

There are times when blogs cease to exist and the only place to find the content is on the Wayback Machine. Even then, it is in danger of being lost completely. One such example is the subject of this post.

Though this website makes use of the facilities of Cloudflare for various functions that include the blocking of image hotlinking, the same outcome can be achieved using .htaccess files on Apache web servers. It may work on Nginx to a point too but there are other configuration files that ought to be updated instead of using a .htaccess when some frown upon the approach. In any case, the lines that need adding to .htaccess are listed below though the web address needs to include your own domain in place of the dummy example provided:

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?yourdomain.com(/)?.*$ [NC]
RewriteRule .*\.(gif|jpe?g|png|bmp)$ [F,NC]

The first line turns on the mod_rewrite engine and you may have that done anyway. Of course, the module needs enabling in your Apache configuration for this to work and you have to be allowed to perform the required action as well. This means changing the Apache configuration files. The next pair of lines look at the HTTP referer strings and the third one only allows images to be served from your own web domain and not others. To add more, you need to copy the third line and change the web address accordingly. Any new lines need to precede the last line that defines the file extensions that are to be blocked to other web addresses.

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?yourdomain.com(/)?.*$ [NC]
RewriteRule \.(gif|jpe?g|png|bmp)$ /images/image.gif [L,NC]

Another variant of the previous code involves changing the last line to display a default image showing others what is happening. That may not reduce the bandwidth usage as much as complete blocking but it may be useful for telling others what is happening.

Adding a new domain or subdomain to an SSL certificate using Certbot

11th June 2019

On checking the Site Health page of a WordPress blog, I saw errors that pointed to a problem with its SSL set up. The www subdomain was not included in the site’s certificate and was causing PHP errors as a result though they had no major effect on what visitors saw. Still, it was best to get rid of them so I needed to update the certificate as needed. Execution of a command like the following did the job:

sudo certbot --expand -d existing.com,www.example.com

Using a Let’s Encrypt certificate meant that I could use the certbot command since that already was installed on the server. The --expand and -d switches ensured that the listed domains were added to the certificate to sort out the observed problem. In the above, a dummy domain name is used but this was replaced by the real one to produce the desired effect and make things as they should have been.

Running cron jobs using the www-data system account

22nd December 2018

When you set up your own web server or use a private server (virtual or physical), you will find that web servers run using the www-data account. That means that website files need to be accessible to that system account if not owned by it. The latter is mandatory if you you want WordPress to be able to update itself with needing FTP details.

It also means that you probably need scheduled jobs to be executed using the privileges possess by the www-data account. For instance, I use WP-CLI to automate spam removal and updates to plugins, themes and WordPress itself. Spam removal can be done without the www-data account but the updates need file access and cannot be completed without this. Therefore, I got interested in setting up cron jobs to run under that account and the following command helps to address this:

sudo -u www-data crontab -e

For that to work, your own account needs to be listed in /etc/sudoers or be assigned to the sudo group in /etc/group. If it is either of those, then entering your own password will open the cron file for www-data and it can be edited as for any other account. Closing and saving the session will update cron with the new job details.

In fact, the same approach can be taken for a variety of commands where files only can be access using www-data. This includes copying, pasting and deleting files as well as executing WP-CLI commands. The latter issues a striking message if you run a command using the root account, a pervasive temptation given what it allows. Any alternative to the latter has to be better from a security standpoint.

Moving a website from shared hosting to a virtual private server

24th November 2018

This year has seen some optimisation being applied to my web presences guided by the results of GTMetrix scans. It was then that I realised how slow things were, so server loads were reduced. Anything that slowed response times, such as WordPress plugins, got removed. Usage of Matomo also was curtailed in favour of Google Analytics while HTML, CSS and JS minification followed. What had yet to happen was a search for a faster server. Now, another website has been moved onto a virtual private server (VPS) to see how that would go.

Speed was not the only consideration since security was a factor too. After all, a VPS is more locked away from other users than a folder on a shared server. There also is the added sense of control, so Let’s Encrypt SSL certificates can be added using the Electronic Frontier Foundation’s Certbot. That avoids the expense of using an SSL certificate provided through my shared hosting provider and a successful transition for my travel website may mean that this one undergoes the same move.

For the VPS, I chose Ubuntu 18.04 as its operating system and it came with the LAMP stack already in place. Have offload development websites, the mix of Apache, MySQL and PHP is more familiar to me than anything using Nginx or Python. It also means that .htaccess files become more useful than they were on my previous Nginx-based platform. Having full access to the operating system by means of SSH helps too and should mean that I have fewer calls on technical support since I can do more for myself. Any extra tinkering should not affect others either, since this type of setup is well known to me and having an offline counterpart means that anything riskier is tried there beforehand.

Naturally, there were niggles to overcome with the move. The first to fix was to make the MySQL instance accept calls from outside the server so that I could migrate data there from elsewhere and I even got my shared hosting setup to start using the new database to see what performance boost it might give. To make all this happen, I first found the location of the relevant my.cnf configuration file using the following command:

find / -name my.cnf

Once I had the right file, I commented out the following line that it contained and restarted the database service afterwards using another command to stop the appearance of any error 111 messages:

bind-address 127.0.0.1
service mysql restart

After that, things worked as required and I moved onto another matter: uploading the requisite files. That meant installing an FTP server so I chose proftpd since I knew that well from previous tinkering. Once that was in place, file transfer commenced.

When that was done, I could do some testing to see if I had an active web server that loaded the website. Along the way, I also instated some Apache modules like mod-rewrite using the a2enmod command, restarting Apache each time I enabled another module.

Then, I discovered that Textpattern needed php-7.2-xml installed, so the following command was executed to do this:

apt install php7.2-xml

Then, the following line was uncommented in the correct php.ini configuration file that I found using the same method as that described already for the my.cnf configuration and that was followed by yet another Apache restart:

extension=php_xmlrpc.dll

Addressing the above issues yielded enough success for me to change the IP address in my Cloudflare dashboard so it pointed at the VPS and not the shared server. The changeover happened seamlessly without having to await DNS updates as once would have been the case. It had the added advantage of making both WordPress and Textpattern work fully.

With everything working to my satisfaction, I then followed the instructions on Certbot to set up my new Let’s Encrypt SSL certificate. Aside from a tweak to a configuration file and another Apache restart, the process was more automated than I had expected so I was ready to embark on some fine-tuning to embed the new security arrangements. That meant updating .htaccess files and Textpattern has its own, so the following addition was needed there:

RewriteCond %{HTTPS} !=on
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]

This complemented what was already in the main .htaccess file and WordPress allows you to include http(s) in the address it uses, so that was another task completed. The general .htaccess only needed the following lines to be added:

RewriteCond %{SERVER_PORT} 80
RewriteRule ^(.*)$ https://www.assortedexplorations.com/$1 [R,L]

What all these achieve is to redirect insecure connections to secure ones for every visitor to the website. After that, internal hyperlinks without https needed updating along with any forms so that a padlock sign could be shown for all pages.

With the main work completed, it was time to sort out a lingering niggle regarding the appearance of an FTP login page every time a WordPress installation or update was requested. The main solution was to make the web server account the owner of the files and directories, but the following line was added to wp-config.php as part of the fix even if it probably is not necessary:

define('FS_METHOD', 'direct');

There also was the non-operation of WP Cron and that was addressed using WP-CLI and a script from Bjorn Johansen. To make double sure of its effectiveness, the following was added to wp-config.php to turn off the usual WP-Cron behaviour:

define('DISABLE_WP_CRON', true);

Intriguingly, WP-CLI offers a long list of possible commands that are worth investigating. A few have been examined but more await attention.

Before those, I still need to get my new VPS to send emails. So far, sendmail has been installed, the hostname changed from localhost and the server restarted. More investigations are needed but what I have not is faster than what was there before, so the effort has been rewarded already.

Improving a website contact form

23rd April 2018

On another website, I have had a contact form but it was missing some functionality. For instance, it stored the input in files on a web server instead of emailing them. That was fixed more easily than expected using the PHP mail function. Even so, it remains useful to survey corresponding documentation on the w3schools website.

The other changes affected the way the form looked to a visitor. There was a reset button and that was removed on finding that such things are out of favour these days. Thinking again, there hardly was any need for it any way.

Newer additions that came with HTML5 had their place too. Including user hints using the placeholder attribute should add some user friendliness although I have avoided experimenting with browser-powered input validation for now. Use of the required attribute has its uses for tell a visitor that they have forgotten something but I need to check how that is handled in CSS more thoroughly before I go with that since there are new :required, :optional, :valid and :invalid pseudoclasses that can be used to help.

It seems that there is much more to learn about setting up forms since I last checked. This is perhaps a hint that a few books need reading as part of catching with how things are done these days. There also is something new to learn.

  • All the views that you find expressed on here in postings and articles are mine alone and not those of any organisation with which I have any association, through work or otherwise. As regards editorial policy, whatever appears here is entirely of my own choice and not that of any other person or organisation.

  • Please note that everything you find here is copyrighted material. The content may be available to read without charge and without advertising but it is not to be reproduced without attribution. As it happens, a number of the images are sourced from stock libraries like iStockPhoto so they certainly are not for abstraction.

  • With regards to any comments left on the site, I expect them to be civil in tone of voice and reserve the right to reject any that are either inappropriate or irrelevant. Comment review is subject to automated processing as well as manual inspection but whatever is said is the sole responsibility of the individual contributor.