TOPIC: HTTP 301
Removing query strings from any URL on an Nginx-powered website
12th April 2025My public transport website is produced using Hugo and is hosted on a web server with Nginx. Usually, I use Apache, so this is an exception. When Google highlighted some duplication caused by unneeded query strings, I set to work. However, doing anything with URL's like redirection cannot use a .htaccess
file or MOD_REWRITE on Nginx. Thus, such clauses have to go somewhere else and take a different form.
In my case, the configuration file to edit is /etc/nginx/sites-available/default
because that was what was enabled. Once I had that open, I needed to find the following block:
location / {
# First attempt to serve request as file, then
# as directory, then fall back to displaying a 404.
try_files $uri $uri/ =404;
}
Because I have one section for port 80 and another for port 443, there were two locations that I needed to update due to duplication, though I may have got away without altering the second of these. After adding the redirection clause, the block became:
location / {
# First attempt to serve request as file, then
# as directory, then fall back to displaying a 404.
try_files $uri $uri/ =404;
# Remove query strings only when necessary
if ($args) {
rewrite ^(.*)$ $1? permanent;
}
}
The result of the addition is a permanent (301) redirection whenever there are arguments passed in a query string. The $1?
portion is the rewritten URL without a query string that was retrieved in the initial ^(.*)$
portion. In other words, the redirect it from the original address to a new one with only the part preceding the question mark.
Handily, Nginx allows you to test your updated configuration using the following command:
sudo nginx -t
That helped me with some debugging. Once all was in order, I needed to reload the service by issuing this command:
sudo systemctl reload nginx
With Apache, there is no need to restart the service after updating the .htaccess
file, which adds some convenience. The different locations also mean some care with backups when upgrading the operating system or moving from one server to another. Apart from that, all works well, proving that there can be different ways to complete the same task.
Using .htaccess to control hotlinking
10th October 2020There are times when blogs cease to exist and the only place to find the content is on the Wayback Machine. Even then, it is in danger of being lost completely. One such example is the subject of this post.
Though this website makes use of the facilities of Cloudflare for various functions that include the blocking of image hot linking, the same outcome can be achieved using .htaccess
files on Apache web servers. It may work on Nginx to a point too, but there are other configuration files that ought to be updated instead of using .htaccess
when some frown upon the approach. In any case, the lines that need adding to .htaccess
are listed below, while the web address needs to include your own domain in place of the dummy example provided:
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?yourdomain.com(/)?.*$ [NC]
RewriteRule .*\.(gif|jpe?g|png|bmp)$ [F,NC]
The first line activates the mod_rewrite engine, which you might have already done. For this to work, the module must be enabled in your Apache configuration, and you need permission to make these changes. This requires modifying the Apache configuration files. The next two lines examine the HTTP referrer strings. The third line permits images to be served only from your own web domain, not from others. To include additional domains, copy the third line and change the web address as needed. Any new lines should be placed before the final line that specifies which file extensions are blocked for other web addresses.
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?yourdomain.com(/)?.*$ [NC]
RewriteRule \.(gif|jpe?g|png|bmp)$ /images/image.gif [L,NC]
Another variant of the previous code involves changing the last line to display a default image showing others what is happening. That may not reduce the bandwidth usage as much as complete blocking, but it may be useful for telling others what is happening.
The wonders of mod_rewrite
24th June 2007When I wrote about tidying dynamic URL's a little while back, I had no inkling that that would be a second part to the tale. My discovery of mod_rewrite, an Apache module that facilitates URL translation. The effect is that one URL is presented to the user in the browser address bar, and the very same URL is also seen by search engines, while another is passed to the server for processing. Though it might sound like subterfuge, it works very well once you manage to get it set up properly. While the web host for my hillwalking blog/photo gallery has everything configured such that it is ready to go, the same did not apply to the offline Apache 2.2.x server that I have going on my own Windows XP box. There were two parts to getting it working there:
- Activating mod-rewrite on the server: this is as easy as uncommenting a line in the
httpd.conf
file for the site (the line in question is:LoadModule rewrite_module modules/mod_rewrite.so
). - Ensuring that the
.htaccess
file in the root of the web server directory is active. You need to set the values of theAllowOverride
directives for the server root and CGI directories toAll
so that.htaccess
is active. Not doing it for the latter will result in an error beginning with the following:Options FollowSymLinks or SymLinksIfOwnerMatch is off which implies that
. HavingRewriteRule
directive is forbiddenAllow from All
set for the required directories is another option to consider when you see errors like that.
Once you have got the above sorted, add this line to .htaccess
: RewriteEngine On
. Preceding it with an Options
directive to ensure that FollowSymLinks
and SymLinksIfOwnerMatch
are switched on does no harm at all and may even be needed to get things running. That done, you can set about putting mod_write to work with lines like this:
RewriteRule ^pages/(.*)/?$ pages.php?query=$1
The effect of this is to take http://www.website.com/pages/input
and convert it into a form for action by the server; in this case, that is http://www.website.com/pages.php?query=input
. Anything contained by a bracket is assigned to the value of a system-named variable. If you have several bracketed sections, they are assigned to sequentially numbered variables as follows: $1
for the first, $2
for the second and so on. It's all good stuff when you get it going, and not only does it make things look much neater, but it also possesses an advantage when it comes to future-proofing too. Web addresses can be kept constant over time, even if things change behind the scenes. It means that any returning visitors will find what they saw the last time that they visited and surely must ensure good karma in the eyes of those all important search engines.