Technology Tales

Adventures & experiences in contemporary technology

Building a sitemap in XML

24th November 2022

While there are many tools that will build XML site maps, there is some satisfaction to be had in creating your own. This is in spite of there being a multitude of search engine optimisation plugins for content management systems like WordPress or what is built into static site generators like Hugo. Sometimes, building your own allows for added simplicity and that is shared with recent efforts in WordPress theme development.

The sitemap XML protocol is simple enough to offer a short coding project. The basis was what Hugo generates and I used Python to create the XML files. The only libraries that I needed were configparser, SQLAlchemy and pandas. the first two of these allowed databases to be queried and the last on the list was used for data processing. Otherwise, it was a case of using what is built into the Python language like file writing and looping.

Once the scripts were ready, they could be uploaded to web servers and executed by scheduled jobs using CRON to keep things up to date. along the way, I also uncovered a way to publicise the locations of the sitemap files to search engine bots using robots.txt.  The structure of the instruction is the following:

User-agent: *
Sitemap: sitemap.xml

This means that it announces to all bots the location of the sitemap file. In my case, I always included the full URL for the XML file and that clearly varies by website location.

Redirecting a WordPress site to its home page when its loop finds no posts

5th November 2022

Since I created a bespoke theme for this site, I have been tweaking things as I go. The basis came from the WordPress Theme Developer Handbook, which gave me a simpler starting point shorn of all sorts of complexity that is encountered with other themes. Naturally, this means that there are little rough edges that need tidying over time.

One of these is dealing with errors on the site like when content is not found. This could be a wrong address or a search query that finds no matching posts. When that happens, there is a redirection to the home page using some simple JavaScript within the loop fallback code enclosed within script start and end tags (including the whole code triggers the action from this post so it cannot be shown here):

location.href="[blog home page ]";

The bloginfo function can be used with the url keyword to find the home page so this does not get hard coded. For now, this works so long as JavaScript is enabled but a more robust approach may come in time. It is not possible to do a PHP redirect because of the nature of HTTP: when headers have been sent, it is not possible to do server redirects. At this stage, things become client side so using JavaScript is one way to go instead.

A new look

11th October 2021

Things have been changing on here. Much of that has been behind the scenes with a move to a new VPS for extra speed and all the upheaval that brings. It also gained me a better system for less money than the old upgrade path was costing me and everything feels more responsive as well. Extra work has gone into securing the website as well and I have learned a lot as that has progressed. New lessons were added to older, and sometimes forgotten, ones.

The more obvious change for those who have been here before is that the visual appearance has been refreshed. A new theme has been applied with a multitude of tweaks to make it feel unique and to iron out any rough edges that there may be. This remains a WordPress-based website and new theme is a variant of the Appointee subtheme of the Appointment theme. WordPress does only supports child theming but not grandchild theming so I had to make a copy of Appointee of my own so I could modify things as I see fit.

To my eyes, things do look cleaner, crisper and brighter so I hope that it feels the same to you. Like so many designs these days, the basis is the Bootstrap framework and that is no bad thing in my mind though the standardisation may be too much for some tastes. What has become challenging is that it is getter harder to find new spins on more traditional layouts with everything going for a more magazine-like appearance and summaries being shown on the front page instead of complete articles. That probably reflects how things are going for websites these days so it may be that the next refresh could be more home grown and that is a while away yet.

As the website heads towards its sixteenth year, there is bound to be continuing change. In some ways, I prefer that some things remain unchanged so I use the classic editor instead of Gutenburg because that works best for me. Block-based editing is not for me since I prefer to tinker with code anyway. Still, not all of its influences can be avoided and I have needed to figure out the new widgets interface. It did not feel that intuitive but I suppose that I will grow accustomed to it.

My interest in technology continues even if it saddens me at time and some things do not impress me; the Windows 11 taskbar is one of those so I will not be in any hurry to move away from Windows 10. Still, the pandemic has offered its own learning with virtual conferencing allowing one to lurk and learn new things. For me, this has included R, Python, Julia and DevOps among other things. That proved worthwhile during a time with many restrictions. All that could yield more content yet and some already is on the way.

As ever, it is my own direct working with technology that yields some real niche ideas that others have not covered. With so many technology blogs out there, they may be getting less and less easy to find but everyone has their own journey so I hope to encounter more of them. There remain times when doing precedes telling and that is how it is on here. It is not all about appearances since content matters as much as it ever did.

Adding a new domain or subdomain to an SSL certificate using Certbot

11th June 2019

On checking the Site Health page of a WordPress blog, I saw errors that pointed to a problem with its SSL set up. The www subdomain was not included in the site’s certificate and was causing PHP errors as a result though they had no major effect on what visitors saw. Still, it was best to get rid of them so I needed to update the certificate as needed. Execution of a command like the following did the job:

sudo certbot --expand -d existing.com,www.example.com

Using a Let’s Encrypt certificate meant that I could use the certbot command since that already was installed on the server. The --expand and -d switches ensured that the listed domains were added to the certificate to sort out the observed problem. In the above, a dummy domain name is used but this was replaced by the real one to produce the desired effect and make things as they should have been.

Running cron jobs using the www-data system account

22nd December 2018

When you set up your own web server or use a private server (virtual or physical), you will find that web servers run using the www-data account. That means that website files need to be accessible to that system account if not owned by it. The latter is mandatory if you you want WordPress to be able to update itself with needing FTP details.

It also means that you probably need scheduled jobs to be executed using the privileges possess by the www-data account. For instance, I use WP-CLI to automate spam removal and updates to plugins, themes and WordPress itself. Spam removal can be done without the www-data account but the updates need file access and cannot be completed without this. Therefore, I got interested in setting up cron jobs to run under that account and the following command helps to address this:

sudo -u www-data crontab -e

For that to work, your own account needs to be listed in /etc/sudoers or be assigned to the sudo group in /etc/group. If it is either of those, then entering your own password will open the cron file for www-data and it can be edited as for any other account. Closing and saving the session will update cron with the new job details.

In fact, the same approach can be taken for a variety of commands where files only can be access using www-data. This includes copying, pasting and deleting files as well as executing WP-CLI commands. The latter issues a striking message if you run a command using the root account, a pervasive temptation given what it allows. Any alternative to the latter has to be better from a security standpoint.

Moving a website from shared hosting to a virtual private server

24th November 2018

This year has seen some optimisation being applied to my web presences guided by the results of GTMetrix scans. It is was then that I realised how slow things were so server loads were reduced. Anything that slowed response times, such as WordPress plugins, got removed. Matomo usage also was curtailed in favour of Google Analytics while HTML, CSS and JS minification followed. What had not happened was a search for a faster server and another website has been moved onto a virtual private server (VPS) to see how that would go.

Speed was not the only consideration since security was a factor too. After all, a VPS is more locked away from other users that a folder on a shared server. There also is the added sense of control so Let’s Encrypt SSL certificates can be added using the Electronic Frontier Foundation’s Certbot. That avoids the expense of using an SSL certificate provided through my shared hosting provider and a successful transition for my travel website may mean that this one undergoes the same move.

For the VPS, I chose Ubuntu 18.04 as its operating system and it came with the LAMP stack already in place. Have offload development websites, the mix of Apache, MySQL and PHP is more familiar to me than anything using Nginx or Python. It also means that .htaccess files become more useful than they were on my previous Nginx-based platform. Having full access to the operating system by means of SSH helps too and should mean that I have less calls on technical support since I can do more for myself. Any extra tinkering should not affect others either since this type of setup is well known to me and having an offline counterpart means that anything riskier is tried there beforehand.

Naturally, there were niggles to overcome with the move. The first to fix was to make the MySQL instance accept calls from outside the server so that I could migrate data there from elsewhere and I even got my shared hosting setup to start using the new database to see what performance boost it might give. To make all this happen, I first found the location of the relevant my.cnf configuration file using the following command:

find / -name my.cnf

Once I had the right file, I commented out the following line that it contained and restarted the database service afterwards using another command to stop the appearance of any error 111 messages:

bind-address 127.0.0.1
service mysql restart

After that, things worked as required and I moved onto another matter: uploading the requisite files. That meant installing an FTP server so I chose proftpd since I knew that well from previous tinkering. Once that was in place, file transfer commenced.

When that was done, I could do some testing to see if I had an active web server that loaded the website. Along the way, I also instated some Apache modules like mod-rewrite using the a2enmod command, restarting Apache each time I enabled another module.

Then, I discovered that Textpattern needed php-7.2-xml installed so the following command was executed to do this:

apt install php7.2-xml

Then, the following line was uncommented in the correct php.ini configuration file that I found using the same method as that described already for the my.cnf configuration and that was followed by yet another Apache restart:

extension=php_xmlrpc.dll

Addressing the above issues yield enough success for me to change the IP address in my Cloudflare dashboard so it point at the VPS and not the shared server. The changeover happened seamlessly without having to await DNS updates as once would have been the case. It had the added advantage of making both WordPress and Textpattern work fully.

With everything working to my satisfaction, I then followed the instructions on Certbot to set up my new Let’s Encrypt SSL certificate. Aside from a tweak to a configuration file and another Apache restart, the process was more automated than I had expected so I was ready to embark on some fine tuning to embed the new security arrangements. That meant updating .htaccess files and Textpattern has its own so the following addition was needed there:

RewriteCond %{HTTPS} !=on
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]

This complemented what was already in the main .htaccess file and WordPress allows you include http(s) in the address it uses so that was another task completed. The general .htaccess only needed the following lines to be added:

RewriteCond %{SERVER_PORT} 80
RewriteRule ^(.*)$ https://www.assortedexplorations.com/$1 [R,L]

What all these achieve is to redirect insecure connections to secure ones for every visitor to the website. After that, internal hyperlinks without https needed updating along with any forms so that a padlock sign could be shown for all pages.

With the main work completed, it was time to sort out a lingering niggle regarding the appearance of an FTP login page every time a WordPress installation or update was requested. The main solution was to make the web server account the owner of the files and directories but the following line was added to wp-config.php as part of the fix even if it probably is not necessary:

define('FS_METHOD', 'direct');

There also was the non-operation of WP Cron and that was addressed using WP-CLI and a script from Bjorn Johansen. To make double sure of its effectiveness, the following was added to wp-config.php to turn off the usual WP-Cron behaviour:

define('DISABLE_WP_CRON', true);

Intriguingly, WP-CLI offers a long list of possible commands that are worth investigating. A few have been examined but more await attention.

Before those, I still need to get my new VPS to send emails. So far, sendmail has been installed, the hostname changed from localhost and the server restarted. More investigations are needed but what I have not is faster that what was there before so the effort has been rewarded already.

Overriding replacement of double or triple hyphenation in WordPress

7th June 2016

On here, I have posts with example commands that include double hyphens and they have been displayed merged together, something that has resulted in a comment posted by a visitor to this part of the web. All the while, I have been blaming the fonts that I have been using only for it to be the fault of WordPress itself.

Changing multiple dashes to something else has been a feature of Word autocorrect but I never expected to see WordPress aping that behaviour and it has been doing so for a few years now. The culprit is wptexturize and that cannot be disabled for it does many other useful things.

What happens is that the wptexturize filter changes ‘---‘ (double hyphens) to ‘–’ (– in web entity encoding) and ‘---‘ (triple hyphens) to ‘—’ (— in web entity encoding). The solution is to add another filter to the content that changes these back to the way they were and the following code does this:

add_filter( ‘the_content’ , ‘mh_un_en_dash’ , 50 );
function mh_un_en_dash( $content ) {
$content = str_replace( ‘–’ , ‘--‘ , $content );
$content = str_replace( ‘—’ , ‘---‘ , $content );
return $content;
}

The first line of the segment adds in the new filter that uses the function defined below it. The third and fourth lines above do the required substitution before the function returns the post content for display in the web page. The whole code block can be used to create a plugin or placed the theme’s functions.php file. Either way, things appear without the substitution confusing your readers. It makes me wonder if a bug report has been created for this because the behaviour looks odd to me.

Turning off Advanced Content Filtering in CKEditor

3rd February 2015

On one of my websites, I use Textpattern with CKEditor for editing of articles on there. This was working well until I upgraded CKEditor to a version with a number of 4.1 or newer because it started to change the HTML in my articles when I did not want it to do so, especially when it broke the appearance of the things. A search on Google revealed an unhelpful forum exchange that produced no solution to the issue so I decided to share one on here when I found it.

What I needed to do was switch off what is known as Advanced Content Filtering. It can be tuned but I felt that would take too much time so I implemented something like what you see below in the config.js with the ckeditor folder:

CKEDITOR.editorConfig = function( config ) {
config.allowedContent = true;
};

All settings go with the outer function wrapper and setting the config.allowedContent property to true within there sorted my problem as I wanted. Now, any HTML remains untouched and I am happy with the outcome. It might be better for features like Advanced Content Filtering to be switched off by default and turned on by those with the time and need for it, much like the one of the principles adopted by the WordPress project. Still, having any off switch is better than none at all.

Turning off the full height editor option in WordPress 4.0

10th September 2014

Though I keep a little eye on WordPress development, it is no way near as rigorous as when I submitted a patch that got me a mention on the contributor list of a main WordPress release. That may explain how the full editor setting, which is turned on by default passed by on me without my taking much in the way of notice of it.

WordPress has become so mature now that I almost do not expect major revisions like the overhauls received by the administration back-end in 2008. The second interface was got so right that it still is with us and there were concerns in my mind at the time as to how usable it would be. Sometimes, those initial suspicions can come to nothing.

However, WordPress 4.0 brought a major change to the editor and I unfortunately am not sure that it is successful. A full height editor sounds a good idea in principle but I found some rough edges to its present implementation that leave me wondering if any UX person got to reviewing it. The first reason is that scrolling becomes odd with the editor’s toolbar becoming fixed when you scroll down far enough on an editor screen. The sidebar scrolling then is out of sync with the editor box, which creates a very odd sensation. Having keyboard shortcuts like CTRL+HOME and CTRL+END not working as they should only convinced me that the new arrangement was not for me and I wanted to turn it off.

A search with Google turned up nothing of note so I took to the WordPress.org forum to see if I could get any joy. That revealed that I should have thought of looking in the screen options dropdown box for an option called “Expand the editor to match the window height” so I could clear that tickbox. Because of the appearance of a Visual Editor control on there, I looked on the user profile screen and found nothing so the logic of how things are set up is sub-optimal.  Maybe, the latter option needs to be a screen option now too. Thankfully, the window height editor option only needs setting once for both posts and pages so you are covered for all eventualities at once.

With a distraction-free editing option, I am not sure why someone went for the full height editor too. If WordPress wanted to stick with this, it does need more refinement so it behaves more conventionally. Personally, I would not build a website with that kind of ill-synchronised scrolling effect so it is something needs work as does the location of the Visual Editor setting. It could be that both settings need to be at the user level and not with one being above that level while another is at it. Until I got the actual solution, I was faced with using distraction-free mode all the time and also installed the WP Editor plugin too. That remains due to its code highlighting even if dropping into code view always triggers the need to create a new revision. Despite that, all is better in the end.

Setting the PHP version in .htaccess on Apache web servers

7th September 2014

The default PHP version on my outdoors, travel and photography website is 5.2.17 and that is getting on a bit now since it is no longer supported by the PHP project and has not been thus since 2011. One obvious impact was Piwik, which I used for web analytics and needs at least 5.3.2. WordPress 4.0 even needs 5.2.24 so that upgrade became implausible so I contacted Webfusion’s support team and they showed me how to get to at least 5.3.3 and even as far as 5.5.9. The trick is the addition of a line of code to the .htaccess file (near the top was my choice) like one of the following:

PHP 5.3.x

AddHandler application/x-httpd-php53 .php

PHP 5.5.x

AddHandler application/x-httpd-php55 .php

When I got one of these in place, things started to look promising but for a locked database due to my not watching how big it had got. Replacing it with two additional databases addressed the problem of losing write access though there was a little upheaval caused by this. Using PHP 5.5.9 meant that I spotted messages regarding the deprecation of the mysql_connect function so that needed fixing too (prefixing it with @ might be a temporary fix but a more permanent one always is better so that is what I did in the form of piggybacking off what WordPress uses; MySQLi and PDO_MySQL are other options). Sorting the database issue meant that I saw the upgrade message for WordPress as well as a mix of plugins and themes so all looked better and I need worry less about losing security updates. Also, I am up to the latest version of Piwik too and that’s an even better way to be.

  • All the views that you find expressed on here in postings and articles are mine alone and not those of any organisation with which I have any association, through work or otherwise. As regards editorial policy, whatever appears here is entirely of my own choice and not that of any other person or organisation.

  • Please note that everything you find here is copyrighted material. The content may be available to read without charge and without advertising but it is not to be reproduced without attribution. As it happens, a number of the images are sourced from stock libraries like iStockPhoto so they certainly are not for abstraction.

  • With regards to any comments left on the site, I expect them to be civil in tone of voice and reserve the right to reject any that are either inappropriate or irrelevant. Comment review is subject to automated processing as well as manual inspection but whatever is said is the sole responsibility of the individual contributor.