Technology Tales

Notes drawn from experiences in consumer and enterprise technology

TOPIC: WEB SERVER

Ensuring that website updates make it through every cache layer and onto the web

17th February 2026

How Things Used to Be Simple

There was a time when life used to be much simpler when it came to developing, delivering and maintaining websites. Essentially, seeing your efforts online was a matter of storing or updating your files on a web server, and a hard refresh in the browser would render the updates for you. Now we have added caches here, there and everywhere in the name of making everything load faster at the cost of added complexity.

Today, these caches are found in the application layer, the server level and we also have added content delivery network (CDN) systems too. When trying to see a change that you made, you need to flush the lot, especially when you have been a good citizen and added long persistence times for files that should not change so often. For example, a typical Apache configuration might look like this:

<IfModule mod_expires.c>
# Enable expiries
ExpiresActive On 
# Default directive
ExpiresDefault "access plus 1 month"
# My favicon
ExpiresByType image/x-icon "access plus 1 year"
# Images
ExpiresByType image/gif "access plus 1 month"
ExpiresByType image/png "access plus 1 month"
ExpiresByType image/jpg "access plus 1 month"
ExpiresByType image/jpeg "access plus 1 month"
# CSS
ExpiresByType text/css "access plus 1 month"
# Javascript
ExpiresByType application/javascript "access plus 1 year"
</IfModule>

These settings tell browsers to keep CSS files for a month and JavaScript for a year. This is excellent for performance, but when you update one of these files, you need to override these instructions at every layer. Note that this configuration only controls what your web server tells browsers. The application layer and CDN have their own separate caching rules.

All this is a recipe for confusion when you want to see how everything looks after making a change somewhere. Then, you need a process to make things appear new again. To understand why you need to flush multiple layers, you do need to see where these caches actually sit in your setup.

Understanding the Pipeline

It means that your files travel through several systems before anyone sees them, with each system storing a copy to avoid repeatedly fetching the same file. The pipeline often looks like this:

Layer Examples Actions
Your application Hugo, Grav or WordPress Reads the static files and generates HTML pages
Your web server Nginx or Apache Delivers these pages and files
Your CDN Cloudflare Distributes copies globally
Browsers Chrome, Firefox, Safari Receive and display the content

Anything generated dynamically, for example by PHP, can flow through this pipeline freshly on every request. Someone asks for a page, your application generates it, the web server sends it, the CDN passes it along, and the browser displays it. The information flow is more immediate.

Static components like CSS, JavaScript and images work differently. They flow through the pipeline once, then each layer stores a copy. The next time someone requests that file, each layer serves its stored version instead of asking the previous layer. This is faster, but it means your updates do not flow through automatically. HTML itself might be limited by this retardation, but not so much as other components, in my experience.

When you change a static file, you need to tell each layer to fetch the new version. You work through the pipeline in sequence, starting where the information originates.

Step 1: Update the Application Layer

After uploading your new static files to the server, the first system that needs updating is your application. This is where information enters the pipeline, and we consider Hugo, Grav and WordPress in turn, moving from simplest to most complex. Though these are examples, some of the considerations should be useful elsewhere as well.

Hugo

Hugo is a static site generator, which makes cache management simpler than dynamic CMS systems. When you build your site, Hugo generates all the HTML files in the public/ directory. There is no application-level cache to clear because Hugo does not run on the server. After modifying your templates or content, rebuild your site:

hugo

Then, upload the new files from public/ to your web server. Since Hugo generates static HTML, the complexity is reduced to just the web server, CDN and browser layers. The application layer refresh is simply the build step on your local machine.

Grav CMS

Grav adds more complexity as it runs on the server and manages its own caching. When Grav reads your static files and combines them, it also compiles Twig templates that reference these files. Once you are in the Grav folder on your web server in an SSH session, issue this command to make it read everything fresh:

bin/grav clear-cache

Or manually:

rm -rf cache/* tmp/*

When someone next requests a page, Grav generates HTML that references your new static files. If you are actively developing a Grav theme, you can disable Twig and asset caching to avoid constantly clearing cache. Edit user/config/system.yaml:

cache:
  enabled: true
  check:
    method: file

twig:
  cache: false  # Disable Twig caching
  debug: true
  auto_reload: true

assets:
  css_pipeline: false  # Disable CSS combination
  js_pipeline: false   # Disable JS combination

While this keeps page caching on but disables the caches that interfere with development, do not forget to turn them back on before deploying to production. Not doing so may impact website responsiveness.

WordPress

WordPress introduces the most complexity with its plugin-based architecture. Since WordPress uses plugins to build and store pages, you have to tell these plugins to rebuild so they reference your new static files. Here are some common examples that generate HTML and store it, along with how to make them refresh their cached files:

Page Cache Plugins
Plugin How to Clear Cache
WP Rocket Settings > WP Rocket > Clear Cache (or use the admin bar button)
W3 Total Cache Performance > Dashboard > Empty All Caches
WP Super Cache Settings > WP Super Cache > Delete Cache
LiteSpeed Cache LiteSpeed Cache > Dashboard > Purge All
Redis Object Cache

Redis stores database query results, which are separate from page content. If your static file changes affect database-stored information (like theme options), tell Redis to fetch fresh data.

From the WordPress dashboard, the Redis Object Cache plugin provides: Settings > Redis > Flush Cache. An alternative to do likewise from the command line:

redis-cli FLUSHALL

Note this clears everything in Redis. If you are sharing a Redis instance with other applications, use the WordPress plugin interface instead.

Step 2: Refresh the Web Server Cache

Once your application is now reading the new static files, the next system in the pipeline is your web server. Because it has stored copies of files it previously delivered, it has to be told to fetch fresh copies from your application.

Nginx

The first step is to find where Nginx stores files:

grep -r "cache_path" /etc/nginx/

This shows you lines like fastcgi_cache_path /var/cache/nginx/fastcgi which tell you the location. Using this information, you can remove the stored copies:

# Clear FastCGI cache
sudo rm -rf /var/cache/nginx/fastcgi/*

# Reload Nginx
sudo nginx -s reload

When Nginx receives a request, it fetches the current version from your application instead of serving its stored copy. If you are using a reverse proxy setup, you might also have a proxy cache at /var/cache/nginx/proxy/* which you can clear the same way.

Apache

Apache uses mod_cache for storing files, and the location depends on your configuration. Even so, common locations are /var/cache/apache2/ or /var/cache/httpd/. Using the following commands, you can find your cache directory:

# Ubuntu/Debian
grep -r "CacheRoot" /etc/apache2/

# CentOS/RHEL
grep -r "CacheRoot" /etc/httpd/

Armed with the paths that you have found, you can remove the stored copies:

# Ubuntu/Debian
sudo rm -rf /var/cache/apache2/mod_cache_disk/*
sudo systemctl reload apache2

# CentOS/RHEL
sudo rm -rf /var/cache/httpd/mod_cache_disk/*
sudo systemctl reload httpd

Alternatively, if you have mod_cache_disk configured with htcacheclean, you can use:

sudo htcacheclean -t -p /var/cache/apache2/mod_cache_disk/

When Apache receives a request, it fetches the current version from your application.

Step 3: Update the CDN Layer Cache Contents

After your web server is now delivering the new static files, the next system in the pipeline is your CDN. This has stored copies at edge servers worldwide, which need to be told to fetch fresh copies from your web server. Here, Cloudflare is given as an example.

Cloudflare

First, log into the Cloudflare dashboard and navigate to Caching. Then, click "Purge Everything" and wait a few seconds. Now, when someone requests your files, Cloudflare fetches the current version from your web server instead of serving its stored copy.

If you are actively working on a file and deploying repeatedly, enable Development Mode in the Cloudflare dashboard. This tells Cloudflare to always fetch from your server rather than serving stored copies. Helpfully, it automatically turns itself off after three hours.

Step 4: Refresh What Is Loaded in Your Browser

Having got everything along the pipeline so far, we finally come to the browser. This is where we perform hard refreshing of the content. Perform a forced refresh of what is in your loading browser using appropriate keyboard shortcuts depending on what system you are using. The shortcuts vary, though holding down the Shift key and clicking on the Refresh button works a lot of the time. Naturally, there are other options and here are some suggestions:

Operating System Keyboard Shortcut
macOS Command + Shift + R
Linux Control + Shift + R
Windows Control + F5

Making use of these operations ensures that your static files come through the whole way so you can see them along with anyone else who visits the website.

Recap

Because a lot of detail has been covered on this journey, let us remind ourselves where we have been with this final run-through. Everything follows sequentially:

  1. Upload your changed files to the server
  2. Verify the files uploaded correctly:
    ls -l /path/to/your/css/file.css

    Check the modification time matches when you uploaded.

  3. Refresh your application layer:
    # Grav
    bin/grav clear-cache
    
    # WordPress - via WP-CLI
    wp cache flush
    # Or use your caching plugin's interface
  4. Refresh Redis (if you use it for object caching):
    redis-cli FLUSHALL
    # Or via WordPress plugin interface
  5. Refresh your web server layer (if using Nginx or Apache):
    # Nginx
    sudo rm -rf /var/cache/nginx/fastcgi/*
    sudo nginx -s reload
    
    # Apache (Ubuntu/Debian)
    sudo rm -rf /var/cache/apache2/mod_cache_disk/*
    sudo systemctl reload apache2
  6. Refresh your CDN layer: Cloudflare dashboard > Purge Everything
  7. Perform a forced refresh of what is in your loading browser using appropriate keyboard shortcuts depending on what system you are using
  8. Test the page to confirm the update has come through fully

Any changes become visible because the new files have travelled from your application through each system in the pipeline. Sometimes, this may happen seamlessly without intervention, but it is best to know what to do when that is not how things proceed.

Related Reading

Fixing Crontab editor permissions for www-data

3rd September 2025

There are times when I set jobs to run using the web server account www-data for various website maintenance tasks. To ensure that I am doing this for the right account, I issue the following command:

sudo -u www-data crontab -e

However, doing so on this server yielded the following:

touch: cannot touch '/var/www/.selected_editor': Permission denied
Unable to create directory
/var/www/.local/share/nano/: No such file or directory
It is required for saving/loading search history or cursor positions.
No modification made

While things otherwise worked as they should with nano as the editor, I felt it best to avoid such output if I could. Thus, I modified the command like this:

sudo -u www-data HOME=/tmp crontab -e

This sets the home directory for www-data as /tmp to allow the setting of an editor, at least on an ephemeral basis. The root cause of the messages is that www-data is not a user account like others and does not get a home area. Thus, the above workaround gets around that, without the artificiality of creating a www-data folder in the /home directory. Some might get around the whole business using the Vi editor, but nano suits me better.

How complexity can blind you

17th August 2025

Visitors may not have noticed it, but I was having a lot of trouble with this website. Intermittent slowdowns beset any attempt to add new content or perform any other administration. This was not happening on any other web portal that I had, even one sharing the same publishing software.

Even so, WordPress did get the blame at first, at least when deactivating plugins had no effect. Then, it was the turn of the web server, resulting in a move to something more powerful and my leaving Apache for Nginx at the same time. Redis caching was another suspect, especially when things got in a twist on the new instance. As if that were not enough, MySQL came in for some scrutiny too.

Finally, another suspect emerged: Cloudflare. Either some settings got mangled or something else was happening, but cutting out that intermediary was enough to make things fly again. Now, I use bunny.net for DNS duties instead, and the simplification has helped enormously; the previous layering was no help with debugging. With a bit of care, I might add some other tools behind the scenes while taking things slowly to avoid confusion in the future.

Removing query strings from any URL on an Nginx-powered website

12th April 2025

My public transport website is produced using Hugo and is hosted on a web server with Nginx. Usually, I use Apache, so this is an exception. When Google highlighted some duplication caused by unneeded query strings, I set to work. However, doing anything with URL's like redirection cannot use a .htaccess file or MOD_REWRITE on Nginx. Thus, such clauses have to go somewhere else and take a different form.

In my case, the configuration file to edit is /etc/nginx/sites-available/default because that was what was enabled. Once I had that open, I needed to find the following block:

location / {
        # First attempt to serve request as file, then
        # as directory, then fall back to displaying a 404.
        try_files $uri $uri/ =404;
}

Because I have one section for port 80 and another for port 443, there were two locations that I needed to update due to duplication, though I may have got away without altering the second of these. After adding the redirection clause, the block became:

location / {
        # First attempt to serve request as file, then
        # as directory, then fall back to displaying a 404.
        try_files $uri $uri/ =404;

        # Remove query strings only when necessary
        if ($args) {
                rewrite ^(.*)$ $1? permanent;
        }
}

The result of the addition is a permanent (301) redirection whenever there are arguments passed in a query string. The $1? portion is the rewritten URL without a query string that was retrieved in the initial ^(.*)$ portion. In other words, the redirect it from the original address to a new one with only the part preceding the question mark.

Handily, Nginx allows you to test your updated configuration using the following command:

sudo nginx -t

That helped me with some debugging. Once all was in order, I needed to reload the service by issuing this command:

sudo systemctl reload nginx

With Apache, there is no need to restart the service after updating the .htaccess file, which adds some convenience. The different locations also mean some care with backups when upgrading the operating system or moving from one server to another. Apart from that, all works well, proving that there can be different ways to complete the same task.

Making the LanguageTool embedded HTTP Server work on Windows 11

11th August 2024

My choice of Markdown editor is VS Code or VSCodium, the latter being a fork of the former with Microsoft telemetry removed. In either case, I use the LanguageTool Linter extension for the required grammar and spelling checks. Pointing that to the remote web service offered by LanguageTool could get punitive, even if I am a subscriber. Thus, I use a locally installed equivalent instead.

In my usual Linux system, that is how I work. However, I have replicated the set-up on a Windows laptop for added flexibility. The needed the JRE, so that was downloaded from the Oracle website and then installed. The next step is to download the LanguageTool embedded HTTP Server zip file and decompress it to a chosen location. To run the server, the command like the following is issued from the Windows Terminal (the single line may break over two here):

java -cp "[Chosen Location]\LanguageTool-stable\LanguageTool-6.4\languagetool-server.jar" org.languagetool.server.HTTPServer --port 8081 --allow-origin

That is enough to get things going because it fulfils the default settings of the LanguageTool Linter extension in VS Code or VSCodium. The fastText application is unavailable for Windows, so I did without it. So far, things are operating acceptably, even if there is a way to address more memory should that be required.

Dealing with the following message issued when using Certbot on Apache: "Unable to find corresponding HTTP vhost; Unable to create one as intended addresses conflict; Current configuration does not support automated redirection"

12th April 2024

When doing something with Certbot on another website not so long ago, I encountered the above message when executing the following command (semicolons have been added to separate the lines):

sudo certbot --apache

The solution was to open /etc/apache2/sites-available/000-default.conf using nano and update the ServerName field (or the line containing this keyword) so it matched the address used for setting up Let's Encrypt SSL certificates. The mention of Apache in the above does make the solution specific to this web server software, so you will need another solution if you meet this kind of problem when using Nginx or another web server.

Removing redundant kernels from Ubuntu

29th October 2022

Recently, a message appeared on some web servers that I have that exhorted me to upgrade to Ubuntu 22.04.1 using the do-release-upgrade command. In the interests of remaining current, I did just that to get another message, one like the following:

The upgrade needs a total of [amount of space with units] free space on disk `/boot`.
Please free at least an additional [amount of space with units] of disk space on `/boot`.
Empty your trash and remove temporary packages of former installations
using `sudo apt-get clean`.

Using sudo apt-get clean did not resolve the problem, so the advice given was of no use. The actual problem was that there were too many old kernels cluttering up /boot and searching around the web provided that wisdom. What also came up was a single command for resolving the problem. However, removing the wrong kernel can wreck a system, ensuring that I took a more cautious approach. First, I listed the kernels to be removed and checked that they did not include the currently running one. This was done with the following command (broken up over several lines for clarity using the backslash character to denote continuation) and running uname -r found the details of the running kernel:

dpkg -l linux-{image,headers}-"[0-9]*" \

| awk '/ii/{print $2}' \

| grep -ve "$(uname -r \

| sed -r 's/-[a-z]+//')"

The dpkg command listed the installed kernels with awk, grep and sed filtering out unwanted sections of the text. The awk command takes the tabular output from dpkg and turns it into a list. The -v switch on the grep command gets the lines that do not match the search expression created by the sed command, while the -e switch makes grep look for patterns. The sed command removes all letters from the output of the uname command, where the -r switch produces the kernel release details, to leave on the release number of the current kernel. On being satisfied that nothing untoward would happen, the full command below (also broken up over several lines for clarity, using the backslash character to denote continuation) could be executed.

sudo apt purge $(dpkg -l linux-{image,headers}-"[0-9]*" \

| awk '/ii/{print $2}' \

| grep -ve "$(uname -r \

| sed -r 's/-[a-z]+//')")

This apt to purge the unwanted kernels, thus freeing up enough space for the upgrade to continue. That happened without significant incident, though there were some remediations needed on the PHP side to get the website working smoothly again.

A quick look at the 7G Firewall

17th October 2021

There is a simple principal with the 7G Firewall from Perishable press: it is a set of mod_rewrite rules for the Apache web server that can be added to a .htaccess file, and there also is a version for the Nginx web server as well. These check query strings, request Uniform Resource Identifiers (URI's), user agents, remote hosts, HTTP referrers and request methods for any anomalies and blocks those that appear dubious.

Unfortunately, I found that the rules heavily slowed down a website with which I tried them, so I am going to have to wait until that is moved to a faster system before I really can give them a go. This can be a problem with security tools, as I also found with adding a modsec jail to a Fail2Ban instance. As it happens, both sets of observations were made using the GTMetrix tool, making it appear that there is a trade-off between security and speed that needs to be assessed before adding anything to block unwanted web visitors.

Using .htaccess to control hotlinking

10th October 2020

There are times when blogs cease to exist and the only place to find the content is on the Wayback Machine. Even then, it is in danger of being lost completely. One such example is the subject of this post.

Though this website makes use of the facilities of Cloudflare for various functions that include the blocking of image hot linking, the same outcome can be achieved using .htaccess files on Apache web servers. It may work on Nginx to a point too, but there are other configuration files that ought to be updated instead of using .htaccess when some frown upon the approach. In any case, the lines that need adding to .htaccess are listed below, while the web address needs to include your own domain in place of the dummy example provided:

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?yourdomain.com(/)?.*$ [NC]
RewriteRule .*\.(gif|jpe?g|png|bmp)$ [F,NC]

The first line activates the mod_rewrite engine, which you might have already done. For this to work, the module must be enabled in your Apache configuration, and you need permission to make these changes. This requires modifying the Apache configuration files. The next two lines examine the HTTP referrer strings. The third line permits images to be served only from your own web domain, not from others. To include additional domains, copy the third line and change the web address as needed. Any new lines should be placed before the final line that specifies which file extensions are blocked for other web addresses.

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?yourdomain.com(/)?.*$ [NC]
RewriteRule \.(gif|jpe?g|png|bmp)$ /images/image.gif [L,NC]

Another variant of the previous code involves changing the last line to display a default image showing others what is happening. That may not reduce the bandwidth usage as much as complete blocking, but it may be useful for telling others what is happening.

Running cron jobs using the www-data system account

22nd December 2018

When you set up your own web server or use a private server (virtual or physical), you will find that web servers run using the www-data account. That means that website files need to be accessible to that system account if not owned by it. The latter is mandatory if you want WordPress to be able to update itself with needing FTP details.

It also means that you probably need scheduled jobs to be executed using the privileges possessed by the www-data account. For instance, I use WP-CLI to automate spam removal and updates to plugins, themes and WordPress itself. Spam removal can be done without the www-data account, but the updates need file access and cannot be completed without this. Therefore, I got interested in setting up cron jobs to run under that account and the following command helps to address this:

sudo -u www-data crontab -e

For that to work, your own account needs to be listed in /etc/sudoers or be assigned to the sudo group in /etc/group. If it is either of those, then entering your own password will open the cron file for www-data, and it can be edited as for any other account. Closing and saving the session will update cron with the new job details.

In fact, the same approach can be taken for a variety of commands where files only can be accessed using www-data. This includes copying, pasting and deleting files as well as executing WP-CLI commands. The latter issues a striking message if you run a command using the root account, a pervasive temptation given what it allows. Any alternative to the latter has to be better from a security standpoint.

  • The content, images, and materials on this website are protected by copyright law and may not be reproduced, distributed, transmitted, displayed, or published in any form without the prior written permission of the copyright holder. All trademarks, logos, and brand names mentioned on this website are the property of their respective owners. Unauthorised use or duplication of these materials may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties.

  • All comments on this website are moderated and should contribute meaningfully to the discussion. We welcome diverse viewpoints expressed respectfully, but reserve the right to remove any comments containing hate speech, profanity, personal attacks, spam, promotional content or other inappropriate material without notice. Please note that comment moderation may take up to 24 hours, and that repeatedly violating these guidelines may result in being banned from future participation.

  • By submitting a comment, you grant us the right to publish and edit it as needed, whilst retaining your ownership of the content. Your email address will never be published or shared, though it is required for moderation purposes.