URL | Technology Tales

TOPIC: URL

Removing query strings from any URL on an Nginx-powered website

12^th April 2025

My public transport website is produced using Hugo and is hosted on a web server with Nginx. Usually, I use Apache, so this is an exception. When Google highlighted some duplication caused by unneeded query strings, I set to work. However, doing anything with URL's like redirection cannot use a .htaccess file or MOD_REWRITE on Nginx. Thus, such clauses have to go somewhere else and take a different form.

In my case, the configuration file to edit is /etc/nginx/sites-available/default because that was what was enabled. Once I had that open, I needed to find the following block:

location / {
        # First attempt to serve request as file, then
        # as directory, then fall back to displaying a 404.
        try_files $uri $uri/ =404;
}

Because I have one section for port 80 and another for port 443, there were two locations that I needed to update due to duplication, though I may have got away without altering the second of these. After adding the redirection clause, the block became:

location / {
        # First attempt to serve request as file, then
        # as directory, then fall back to displaying a 404.
        try_files $uri $uri/ =404;

        # Remove query strings only when necessary
        if ($args) {
                rewrite ^(.*)$ $1? permanent;
        }
}

The result of the addition is a permanent (301) redirection whenever there are arguments passed in a query string. The $1? portion is the rewritten URL without a query string that was retrieved in the initial ^(.*)$ portion. In other words, the redirect it from the original address to a new one with only the part preceding the question mark.

Handily, Nginx allows you to test your updated configuration using the following command:

sudo nginx -t

That helped me with some debugging. Once all was in order, I needed to reload the service by issuing this command:

sudo systemctl reload nginx

With Apache, there is no need to restart the service after updating the .htaccess file, which adds some convenience. The different locations also mean some care with backups when upgrading the operating system or moving from one server to another. Apart from that, all works well, proving that there can be different ways to complete the same task.

Add Canonical Tags to WordPress without plugins

31^st March 2025

Search engines need to know which is which because they cannot know which is the real content when there is any duplication, unless you tell them. That is where canonical tags come in handy. By default, WordPress appears to add these for posts and pages, which makes sense. However, you can add them for other places too. While a plugin can do this for you, adding some code to your theme's functions.php file also does the job. This is how it could look:

function add_canonical_link() {
    global $post;

    // Check if we're on a single post/page
    if (is_singular()) {
        $canonical_url = get_permalink($post->ID);
    } 
    // For the homepage
    elseif (is_home() || is_front_page()) {
        $canonical_url = home_url('/');
    }
    // For category archives
    elseif (is_category()) {
        $canonical_url = get_category_link(get_query_var('cat'));
    }
    // For tag archives
    elseif (is_tag()) {
        $canonical_url = get_tag_link(get_query_var('tag_id'));
    }
    // For other archive pages
    elseif (is_archive()) {
        $canonical_url = get_permalink();
    }
    // Fallback for other pages
    else {
        $canonical_url = get_permalink();
    }

    // Output the canonical link
    echo '' . "\n";
}

// Hook the function to wp_head
add_action('wp_head', 'add_canonical_link');

// Remove default canonical link
remove_action('wp_head', 'rel_canonical');

The first part defines a function to define the canonical URL and create the tag to be added. With that completed, the penultimate piece of code hooks it into the wp_head part of the web page, while the last function gets rid of the default link to get avoid any duplication of output.

WordPress URL management with canonical tags and permalink simplification

29^th March 2025

Recently, I have been going through the content, rewriting things where necessary. In the early days, there were some posts following diary and announcement styles that I now avoid. Some now have been moved to a more appropriate place for those, while others have been removed.

While this piece might fall into the announcement category, I am going to mix up things too. After some prevarication, I have removed dates from the addresses of entries like this after seeing some duplication. Defining canonical URL's in the page header like this does help:

<link rel="canonical" href="[URL]">

However, it becomes tricky when you have zero-filled and non-zero-filled dates going into URL's. Using the following in a .htaccess file redirects the latter to the former, which is a workaround:

RewriteRule ^([0-9]{4})/([1-9])/([0-9]{1,2})/(.*)$ /$1/0$2/$3/$4 [R=301,L] RewriteRule ^([0-9]{4})/(0[1-9]|1[0-2])/([1-9])/(.*)$ /$1/$2/0$3/$4 [R=301,L]

The first of these lines zero-fills the month component, while the second zero-fills the day component. Here, [0-9]{4} looks for a four digit year. Then, [1-9] picks up the non-zero-filled components that need zero-prefixing. The replacements are 0$2 or 0$3 as needed.

Naturally, this needs URL rewriting to be turned on for it to work, which it does. Since my set-up is on Apache, the MOD_REWRITE module needs to be activated too. Then, your configuration needs to allow its operation. With dates removed from WordPress permalinks, I had to add the following line to redirect old addresses to new ones for the sake of search engine optimisation:

RedirectMatch 301 ^/([0-9]{4})/([0-9]{2})/([0-9]{2})/(.*)$ /$4

Here, [0-9]{4} picks up the four digit year and [0-9]{2} finds the two-digit month and day. The, (.*) is the rest of the URL that is retained as signalled by /$4 at the end. That redirects things nicely, without my having to have a line for every post on the website. Another refinement was to remove query strings from every page a visitor would see:

RewriteCond %{REQUEST_URI} !(^/wp-admin/|^/wp-login\.php$) [NC] RewriteCond %{QUERY_STRING} . RewriteCond %{QUERY_STRING} !(&preview=true) [NC] RewriteRule ^(.*)$ /$1? [R=301,L]

This still allows the back end and login screens to work as before, along with post previews during the writing stage. One final note is that I am not using the default login address for the sake of added security, yet that needs to be mentioned nowhere in the .htaccess file anyway.