Technology Tales

Notes drawn from experiences in consumer and enterprise technology

TOPIC: WORDPRESS

Building a sitemap in XML

24th November 2022

While there are many tools that will build XML site maps, there is some satisfaction to be had in creating your own. This is despite there being a multitude of search engine optimisation plugins for content management systems like WordPress or what is built into static site generators like Hugo. Sometimes, building your own allows for added simplicity, and that is shared with recent efforts in WordPress theme development.

The sitemap XML protocol is simple enough to offer a short coding project. The basis was what Hugo generates, and I used Python to create the XML files. The only libraries that I needed were configparser, SQLAlchemy and pandas. The first two of these allowed databases to be queried, and the last on the list was used for data processing. Otherwise, it was a case of using what is built into the Python language, like file writing and looping.

Once the scripts were ready, they could be uploaded to web servers and executed by scheduled jobs using CRON to keep things up to date. Along the way, I also uncovered a way to publicise the locations of the sitemap files to search engine bots using robots.txt.  The structure of the instruction is the following:

User-agent: *
Sitemap: sitemap.xml

This means that it announces to all bots the location of the sitemap file. In my case, I always included the full URL for the XML file, and that clearly varies by website location.

Redirecting a WordPress site to its home page when its loop finds no posts

5th November 2022

Since I created a bespoke theme for this site, I have been tweaking things as I go. The basis came from the WordPress Theme Developer Handbook, which gave me a simpler starting point shorn of all sorts of complexity that is encountered with other themes. Naturally, this means that there are little rough edges that need tidying over time.

One of these is dealing with errors on the site, like when content is not found. This could be a wrong address or a search query that finds no matching posts. When that happens, there is a redirection to the home page using some simple JavaScript within the loop fallback code enclosed within script start and end tags (including the whole code triggers the action from this post so it cannot be shown here):

location.href="[blog home page ]";

The bloginfo function can be used with the url keyword to find the home page, avoiding hard-coding. For now, this works so long as JavaScript is enabled, but a more robust approach may come in time. It is not possible to do a PHP redirect because of the nature of HTTP: when headers have been sent, it is not possible to do server redirects. At this stage, things become client side, so using JavaScript is one way to go instead.

Behind the scenes of a website refresh and security overhaul

11th October 2021

Things have been changing on here. Much of that has been behind the scenes with a move to a new VPS for extra speed and all the upheaval that brings. It also gained me a better and more responsive system for less money than the old upgrade path was costing me. Extra work has gone into securing the website too, something that has taught me a lot as that has progressed. New lessons were added to older, and sometimes forgotten, ones.

The more obvious change for those who have been here before is that the visual appearance has been refreshed. A new theme has been applied with a multitude of tweaks to make it feel unique and to iron out any rough edges that there may be. This remains a WordPress-based website, and the new theme is a variant of the Appointee child theme of the Appointment theme. Since WordPress does only support child theming but not grandchild theming, I had to make a copy of Appointee of my own so I could modify things as I see fit.

To my eyes, things do look cleaner, crisper and brighter, so I hope that it feels the same to you. Like so many designs these days, the basis is the Bootstrap framework and that is no bad thing in my mind, though the standardisation may be too much for some tastes. What has become challenging is that it is getter harder to find new spins on more traditional layouts, with everything going for a more magazine-like appearance and summaries being shown on the front page instead of complete articles. That probably reflects how things are going for websites these days, which could make the next refresh a more home-grown effort, even if that is a while away yet.

As the website heads towards its sixteenth year, there is bound to be continuing change. In some ways, I prefer that some things remain unchanged, so I use the classic editor instead of Gutenberg because that works best for me. Block-based editing is not for me, since I prefer to tinker with code anyway. Still, not all of its influences can be avoided, leaving me to figure out the new widgets interface. While it did not feel that intuitive, I suppose that I will grow accustomed to it.

My interest in technology continues, even if it saddens me at this time and some things do not impress me; the Windows 11 taskbar is one of those, so I will not be in any hurry to move away from Windows 10. Still, the pandemic has offered its own learning, with virtual conferencing allowing one to lurk and learn new things. For me, this has included R, Python, Julia and DevOps among other things. That proved worthwhile during a time with many restrictions. All that could yield more content yet, and some already is on the way.

As ever, it is my own direct working with technology that yields some real niche ideas that others have not covered. With so many technology blogs out there, they may be getting less and less easy to find, yet everyone has their own journey, so I hope to encounter more of them. There remain times when doing precedes telling, which is how it is on here. It is not all about appearances, since content matters as much as it ever did.

Running cron jobs using the www-data system account

22nd December 2018

When you set up your own web server or use a private server (virtual or physical), you will find that web servers run using the www-data account. That means that website files need to be accessible to that system account if not owned by it. The latter is mandatory if you want WordPress to be able to update itself with needing FTP details.

It also means that you probably need scheduled jobs to be executed using the privileges possessed by the www-data account. For instance, I use WP-CLI to automate spam removal and updates to plugins, themes and WordPress itself. Spam removal can be done without the www-data account, but the updates need file access and cannot be completed without this. Therefore, I got interested in setting up cron jobs to run under that account and the following command helps to address this:

sudo -u www-data crontab -e

For that to work, your own account needs to be listed in /etc/sudoers or be assigned to the sudo group in /etc/group. If it is either of those, then entering your own password will open the cron file for www-data, and it can be edited as for any other account. Closing and saving the session will update cron with the new job details.

In fact, the same approach can be taken for a variety of commands where files only can be accessed using www-data. This includes copying, pasting and deleting files as well as executing WP-CLI commands. The latter issues a striking message if you run a command using the root account, a pervasive temptation given what it allows. Any alternative to the latter has to be better from a security standpoint.

Moving a website from shared hosting to a virtual private server

24th November 2018

This year has seen some optimisation being applied to my web presences, guided by the results of GTMetrix scans. It was then that I realised how slow things were, so server loads were reduced. Anything that slowed response times, such as WordPress plugins, got removed. Usage of Matomo also was curtailed in favour of Google Analytics, while HTML, CSS and JS minification followed. Something that had yet to happen was a search for a faster server. Now, another website has been moved onto a virtual private server (VPS) to see how that would go.

Speed was not the only consideration, since security was a factor too. After all, a VPS is more locked away from other users than a folder on a shared server. There also is the added sense of control, so Let's Encrypt SSL certificates can be added using the Electronic Frontier Foundation's Certbot. That avoids the expense of using an SSL certificate provided through my shared hosting provider, and a successful transition for my travel website may mean that this one undergoes the same move.

For the VPS, I chose Ubuntu 18.04 as its operating system, and it came with the LAMP stack already in place. Have offload development websites, the mix of Apache, MySQL and PHP is more familiar to me than anything using Nginx or Python. It also means that .htaccess files become more useful than they were on my previous Nginx-based platform. Having full access to the operating system using SSH helps too and should mean that I have fewer calls on technical support since I can do more for myself. Any extra tinkering should not affect others either, since this type of setup is well known to me and having an offline counterpart means that anything riskier is tried there beforehand.

Naturally, there were niggles to overcome with the move. The first to fix was to make the MySQL instance accept calls from outside the server so that I could migrate data there from elsewhere, and I even got my shared hosting setup to start using the new database to see what performance boost it might give. To make all this happen, I first found the location of the relevant my.cnf configuration file using the following command:

find / -name my.cnf

Once I had the right file, I commented out the following line that it contained and restarted the database service afterwards, using another command to stop the appearance of any error 111 messages:

bind-address 127.0.0.1
service mysql restart

After that, things worked as required and I moved onto another matter: uploading the requisite files. That meant installing an FTP server, so I chose proftpd since I knew that well from previous tinkering. Once that was in place, file transfer commenced.

When that was done, I could do some testing to see if I had an active web server that loaded the website. Along the way, I also instated some Apache modules like mod-rewrite using the a2enmod command, restarting Apache each time I enabled another module.

Then, I discovered that Textpattern needed php-7.2-xml installed, so the following command was executed to do this:

apt install php7.2-xml

Then, the following line was uncommented in the correct php.ini configuration file that I found using the same method as that described already for the my.cnf configuration and that was followed by yet another Apache restart:

extension=php_xmlrpc.dll

Addressing the above issues yielded enough success for me to change the IP address in my Cloudflare dashboard so it pointed at the VPS and not the shared server. The changeover happened seamlessly without having to await DNS updates as once would have been the case. It had the added advantage of making both WordPress and Textpattern work fully.

With everything working to my satisfaction, I then followed the instructions on Certbot to set up my new Let's Encrypt SSL certificate. Aside from a tweak to a configuration file and another Apache restart, the process was more automated than I had expected, so I was ready to embark on some fine-tuning to embed the new security arrangements. That meant updating .htaccess files and Textpattern has its own, so the following addition was needed there:

RewriteCond %{HTTPS} !=on
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]

This complemented what was already in the main .htaccess file, and WordPress allows you to include http(s) in the address it uses, so that was another task completed. The general .htaccess only needed the following lines to be added:

RewriteCond %{SERVER_PORT} 80
RewriteRule ^(.*)$ https://www.assortedexplorations.com/$1 [R,L]

What all these achieve is to redirect insecure connections to secure ones for every visitor to the website. After that, internal hyperlinks without https needed updating along with any forms so that a padlock sign could be shown for all pages.

With the main work completed, it was time to sort out a lingering niggle regarding the appearance of an FTP login page every time a WordPress installation or update was requested. The main solution was to make the web server account the owner of the files and directories, but the following line was added to wp-config.php as part of the fix even if it probably is not necessary:

define('FS_METHOD', 'direct');

There also was the non-operation of WP Cron and that was addressed using WP-CLI and a script from Bjorn Johansen. To make double sure of its effectiveness, the following was added to wp-config.php to turn off the usual WP-Cron behaviour:

define('DISABLE_WP_CRON', true);

Intriguingly, WP-CLI offers a long list of possible commands that are worth investigating. A few have been examined, but more await attention.

Before those, I still need to get my new VPS to send emails. So far, sendmail has been installed, the hostname changed from localhost and the server restarted. More investigations are needed, but what I have not is faster than what was there before, so the effort has been rewarded already.

Overriding replacement of double or triple hyphenation in WordPress

7th June 2016

On here, I have posts with example commands that include double hyphens, and they have been displayed merged together, something that has resulted in a comment posted by a visitor to this part of the web. All the while, I have been blaming the fonts that I have been using, only for it to be the fault of WordPress itself.

Changing multiple dashes to something else has been a feature of Word autocorrect, but I never expected to see WordPress aping that behaviour, and it has been doing so for a few years now. The culprit is wptexturize and that cannot be disabled, for it does many other useful things.

What happens is that the wptexturize filter changes '--' (double hyphens) to '–' (– in web entity encoding) and '---' (triple hyphens) to '—' (— in web entity encoding). The solution is to add another filter to the content that changes these back to the way they were, and the following code does this:

add_filter( 'the_content' , 'mh_un_en_dash' , 50 );
function mh_un_en_dash( $content ) {
$content = str_replace( '–' , '--' , $content );
$content = str_replace( '—' , '---' , $content );
return $content;
}

The first line of the segment adds in the new filter that uses the function defined below it. The third and fourth lines above do the required substitution before the function returns the post content for display in the web page. The whole code block can be used to create a plugin or placed in the theme's functions.php file. Either way, things appear without the substitution, confusing your readers. It makes me wonder if a bug report has been created for this because the behaviour looks odd to me.

Turning off Advanced Content Filtering in CKEditor

3rd February 2015

On one of my websites, I use Textpattern with CKEditor for editing of articles on there. This was working well until I upgraded CKEditor to a version with a number of 4.1 or newer because it started to change the HTML in my articles when I did not want it to do so, especially when it broke the appearance of the things. A search on Google revealed an unhelpful forum exchange that produced no solution to the issue so I decided to share one on here when I found it.

What I needed to do was switch off what is known as Advanced Content Filtering. It can be tuned but I felt that would take too much time so I implemented something like what you see below in the config.js with the ckeditor folder:

CKEDITOR.editorConfig = function( config ) {
config.allowedContent = true;
};

All settings go with the outer function wrapper and setting the config.allowedContent property to true within there sorted my problem as I wanted. Now, any HTML remains untouched and I am happy with the outcome. It might be better for features like Advanced Content Filtering to be switched off by default and turned on by those with the time and need for it, much like the one of the principles adopted by the WordPress project. Still, having any off switch is better than none at all.

Turning off the full height editor option in WordPress 4.0

10th September 2014

Though I casually follow WordPress development, it's not nearly as rigorous as when I submitted a patch that earned me a mention on a main WordPress release's contributor list. This may explain why I barely noticed the full editor setting, which is turned on by default.

WordPress has become so mature now that I almost do not expect major revisions like the overhauls received by the administration back-end in 2008. The second interface was got so right that it still is with us, even if there were concerns in my mind at the time as to how usable it would be. Sometimes, those initial suspicions can come to nothing.

However, WordPress 4.0 introduced a major editor change that I'm not sure is successful. A full-height editor sounds good in principle, but its implementation has rough edges that make me wonder if any UX person reviewed it. Scrolling becomes strange, with the editor's toolbar fixing in place when you scroll down far enough. The sidebar then scrolls out of sync with the editor box, creating an odd sensation. Keyboard shortcuts like CTRL + HOME and CTRL + END don't work properly, which convinced me this new arrangement wasn't for me and I wanted to disable it.

A Google search found nothing useful, so I tried the WordPress.org forum. This revealed I should have looked in the screen options dropdown box for "Expand the editor to match the window height" to deselect it. Because of a Visual Editor control there, I'd checked the user profile screen but found nothing, showing the setup logic is poor. Perhaps the Visual Editor option should be a screen option too. Thankfully, the window height editor setting only needs changing once for both posts and pages, covering all situations.

With a distraction-free editing option available, I'm not sure why someone added the full height editor too. If WordPress keeps this feature, it needs refinement to behave more conventionally. I wouldn't build a website with such ill-synchronised scrolling. This needs work, as does the Visual Editor setting location. Perhaps both settings should be at the user level, rather than having one above that level. Before finding the solution, I considered using distraction-free mode permanently and installed the WP Editor plugin. I kept the plugin for its code highlighting, even though entering code view always creates a new revision. Despite this issue, things are now better.

Setting the PHP version in .htaccess on Apache web servers

7th September 2014

The default PHP version on my outdoors, travel and photography website is 5.2.17 and that is getting on a bit now since it is no longer supported by the PHP project and has not been thus since 2011. One obvious impact was Piwik, which I use for web analytics and needs at least 5.3.2. Since WordPress 4.0 will not work without having 5.2.24 or later, that upgrade became implausible. Therefore, I contacted Webfusion's support team, and they showed me how to get to at least 5.3.3 and even as far as 5.5.9. The trick is the addition of a line of code to the .htaccess file (near the top was my choice) like one of the following:

PHP 5.3.x

AddHandler application/x-httpd-php53 .php

PHP 5.5.x

AddHandler application/x-httpd-php55 .php

When I got one of these in place, things started to look promising, but for a locked database due to my not watching how big it had got. Replacing it with two additional databases addressed the problem of losing write-access, though there was a little upheaval caused by this. Using PHP 5.5.9 meant that I spotted messages regarding the deprecation of the mysql_connect function, so that needed fixing too. Prefixing it with @ might have been a temporary fix while I sought a more permanent one. Thus, I opted for piggybacking off what WordPress uses; make use of MySQLi or PDO_MySQL are other options. Sorting the database issue meant that I saw the upgrade message for WordPress as well as a mix of plugins and themes, so all looked better, leaving me to be less concerned about losing security updates. Also, I am up to the latest version of Piwik too, and that's an even better way to be.

Setting up MySQL on Sabayon Linux

27th September 2012

For quite a while now, I have offline web servers for doing a spot of tweaking and tinkering away from the gaze of web users that visit what I have on there. Therefore, one of the tests that I apply to any prospective main Linux distro is the ability to set up a web server on there. This is more straightforward for some than for others. For Ubuntu and Linux Mint, it is a matter of installing the required software and doing a small bit of configuration. My experience with Sabayon is that it needs a little more effort than this, so I am sharing it here for the installation of MySQL.

The first step is to install the software using the commands that you find below. The first pops the software onto the system while the second completes the set-up. The --basedir option is need with the latter because it won't find things without it. It specifies the base location on the system, and it's /usr in my case.

sudo equo install dev-db/mysql
sudo /usr/bin/mysql_install_db --basedir=/usr

With the above complete, it's time to start the database server and set the password for the root user. That's what the two following commands achieve. Once your root password is set, you can go about creating databases and adding other users using the MySQL command line

sudo /etc/init.d/mysql start
mysqladmin -u root password 'password'

The last step is to set the database server to start every time you start your Sabayon system. The first command adds an entry for MySQL to the default run level so that this happens. The purpose of the second command is to check that this happened before restarting your computer to discover if it really happens. This procedure also is necessary for having an Apache web server behave in the same way, so the commands are worth having and even may have a use for other services on your system. ProFTP is another that comes to mind, for instance.

sudo rc-update add mysql default
sudo rc-update show | grep mysql

  • The content, images, and materials on this website are protected by copyright law and may not be reproduced, distributed, transmitted, displayed, or published in any form without the prior written permission of the copyright holder. All trademarks, logos, and brand names mentioned on this website are the property of their respective owners. Unauthorised use or duplication of these materials may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties.

  • All comments on this website are moderated and should contribute meaningfully to the discussion. We welcome diverse viewpoints expressed respectfully, but reserve the right to remove any comments containing hate speech, profanity, personal attacks, spam, promotional content or other inappropriate material without notice. Please note that comment moderation may take up to 24 hours, and that repeatedly violating these guidelines may result in being banned from future participation.

  • By submitting a comment, you grant us the right to publish and edit it as needed, whilst retaining your ownership of the content. Your email address will never be published or shared, though it is required for moderation purposes.