TOPIC: COMPUTER ARCHITECTURE
Ensuring that website updates make it through every cache layer and onto the web
How Things Used to Be Simple
There was a time when life used to be much simpler when it came to developing, delivering and maintaining websites. Essentially, seeing your efforts online was a matter of storing or updating your files on a web server, and a hard refresh in the browser would render the updates for you. Now we have added caches here, there and everywhere in the name of making everything load faster at the cost of added complexity.
Today, these caches are found in the application layer, the server level and we also have added content delivery network (CDN) systems too. When trying to see a change that you made, you need to flush the lot, especially when you have been a good citizen and added long persistence times for files that should not change so often. For example, a typical Apache configuration might look like this:
<IfModule mod_expires.c>
# Enable expiries
ExpiresActive On
# Default directive
ExpiresDefault "access plus 1 month"
# My favicon
ExpiresByType image/x-icon "access plus 1 year"
# Images
ExpiresByType image/gif "access plus 1 month"
ExpiresByType image/png "access plus 1 month"
ExpiresByType image/jpg "access plus 1 month"
ExpiresByType image/jpeg "access plus 1 month"
# CSS
ExpiresByType text/css "access plus 1 month"
# Javascript
ExpiresByType application/javascript "access plus 1 year"
</IfModule>
These settings tell browsers to keep CSS files for a month and JavaScript for a year. This is excellent for performance, but when you update one of these files, you need to override these instructions at every layer. Note that this configuration only controls what your web server tells browsers. The application layer and CDN have their own separate caching rules.
All this is a recipe for confusion when you want to see how everything looks after making a change somewhere. Then, you need a process to make things appear new again. To understand why you need to flush multiple layers, you do need to see where these caches actually sit in your setup.
Understanding the Pipeline
It means that your files travel through several systems before anyone sees them, with each system storing a copy to avoid repeatedly fetching the same file. The pipeline often looks like this:
| Layer | Examples | Actions |
|---|---|---|
| Your application | Hugo, Grav or WordPress | Reads the static files and generates HTML pages |
| Your web server | Nginx or Apache | Delivers these pages and files |
| Your CDN | Cloudflare | Distributes copies globally |
| Browsers | Chrome, Firefox, Safari | Receive and display the content |
Anything generated dynamically, for example by PHP, can flow through this pipeline freshly on every request. Someone asks for a page, your application generates it, the web server sends it, the CDN passes it along, and the browser displays it. The information flow is more immediate.
Static components like CSS, JavaScript and images work differently. They flow through the pipeline once, then each layer stores a copy. The next time someone requests that file, each layer serves its stored version instead of asking the previous layer. This is faster, but it means your updates do not flow through automatically. HTML itself might be limited by this retardation, but not so much as other components, in my experience.
When you change a static file, you need to tell each layer to fetch the new version. You work through the pipeline in sequence, starting where the information originates.
Step 1: Update the Application Layer
After uploading your new static files to the server, the first system that needs updating is your application. This is where information enters the pipeline, and we consider Hugo, Grav and WordPress in turn, moving from simplest to most complex. Though these are examples, some of the considerations should be useful elsewhere as well.
Hugo
Hugo is a static site generator, which makes cache management simpler than dynamic CMS systems. When you build your site, Hugo generates all the HTML files in the public/ directory. There is no application-level cache to clear because Hugo does not run on the server. After modifying your templates or content, rebuild your site:
hugo
Then, upload the new files from public/ to your web server. Since Hugo generates static HTML, the complexity is reduced to just the web server, CDN and browser layers. The application layer refresh is simply the build step on your local machine.
Grav CMS
Grav adds more complexity as it runs on the server and manages its own caching. When Grav reads your static files and combines them, it also compiles Twig templates that reference these files. Once you are in the Grav folder on your web server in an SSH session, issue this command to make it read everything fresh:
bin/grav clear-cache
Or manually:
rm -rf cache/* tmp/*
When someone next requests a page, Grav generates HTML that references your new static files. If you are actively developing a Grav theme, you can disable Twig and asset caching to avoid constantly clearing cache. Edit user/config/system.yaml:
cache:
enabled: true
check:
method: file
twig:
cache: false # Disable Twig caching
debug: true
auto_reload: true
assets:
css_pipeline: false # Disable CSS combination
js_pipeline: false # Disable JS combination
While this keeps page caching on but disables the caches that interfere with development, do not forget to turn them back on before deploying to production. Not doing so may impact website responsiveness.
WordPress
WordPress introduces the most complexity with its plugin-based architecture. Since WordPress uses plugins to build and store pages, you have to tell these plugins to rebuild so they reference your new static files. Here are some common examples that generate HTML and store it, along with how to make them refresh their cached files:
Page Cache Plugins
| Plugin | How to Clear Cache |
|---|---|
| WP Rocket | Settings > WP Rocket > Clear Cache (or use the admin bar button) |
| W3 Total Cache | Performance > Dashboard > Empty All Caches |
| WP Super Cache | Settings > WP Super Cache > Delete Cache |
| LiteSpeed Cache | LiteSpeed Cache > Dashboard > Purge All |
Redis Object Cache
Redis stores database query results, which are separate from page content. If your static file changes affect database-stored information (like theme options), tell Redis to fetch fresh data.
From the WordPress dashboard, the Redis Object Cache plugin provides: Settings > Redis > Flush Cache. An alternative to do likewise from the command line:
redis-cli FLUSHALL
Note this clears everything in Redis. If you are sharing a Redis instance with other applications, use the WordPress plugin interface instead.
Step 2: Refresh the Web Server Cache
Once your application is now reading the new static files, the next system in the pipeline is your web server. Because it has stored copies of files it previously delivered, it has to be told to fetch fresh copies from your application.
Nginx
The first step is to find where Nginx stores files:
grep -r "cache_path" /etc/nginx/
This shows you lines like fastcgi_cache_path /var/cache/nginx/fastcgi which tell you the location. Using this information, you can remove the stored copies:
# Clear FastCGI cache
sudo rm -rf /var/cache/nginx/fastcgi/*
# Reload Nginx
sudo nginx -s reload
When Nginx receives a request, it fetches the current version from your application instead of serving its stored copy. If you are using a reverse proxy setup, you might also have a proxy cache at /var/cache/nginx/proxy/* which you can clear the same way.
Apache
Apache uses mod_cache for storing files, and the location depends on your configuration. Even so, common locations are /var/cache/apache2/ or /var/cache/httpd/. Using the following commands, you can find your cache directory:
# Ubuntu/Debian
grep -r "CacheRoot" /etc/apache2/
# CentOS/RHEL
grep -r "CacheRoot" /etc/httpd/
Armed with the paths that you have found, you can remove the stored copies:
# Ubuntu/Debian
sudo rm -rf /var/cache/apache2/mod_cache_disk/*
sudo systemctl reload apache2
# CentOS/RHEL
sudo rm -rf /var/cache/httpd/mod_cache_disk/*
sudo systemctl reload httpd
Alternatively, if you have mod_cache_disk configured with htcacheclean, you can use:
sudo htcacheclean -t -p /var/cache/apache2/mod_cache_disk/
When Apache receives a request, it fetches the current version from your application.
Step 3: Update the CDN Layer Cache Contents
After your web server is now delivering the new static files, the next system in the pipeline is your CDN. This has stored copies at edge servers worldwide, which need to be told to fetch fresh copies from your web server. Here, Cloudflare is given as an example.
Cloudflare
First, log into the Cloudflare dashboard and navigate to Caching. Then, click "Purge Everything" and wait a few seconds. Now, when someone requests your files, Cloudflare fetches the current version from your web server instead of serving its stored copy.
If you are actively working on a file and deploying repeatedly, enable Development Mode in the Cloudflare dashboard. This tells Cloudflare to always fetch from your server rather than serving stored copies. Helpfully, it automatically turns itself off after three hours.
Step 4: Refresh What Is Loaded in Your Browser
Having got everything along the pipeline so far, we finally come to the browser. This is where we perform hard refreshing of the content. Perform a forced refresh of what is in your loading browser using appropriate keyboard shortcuts depending on what system you are using. The shortcuts vary, though holding down the Shift key and clicking on the Refresh button works a lot of the time. Naturally, there are other options and here are some suggestions:
| Operating System | Keyboard Shortcut |
|---|---|
| macOS | Command + Shift + R |
| Linux | Control + Shift + R |
| Windows | Control + F5 |
Making use of these operations ensures that your static files come through the whole way so you can see them along with anyone else who visits the website.
Recap
Because a lot of detail has been covered on this journey, let us remind ourselves where we have been with this final run-through. Everything follows sequentially:
- Upload your changed files to the server
- Verify the files uploaded correctly:
ls -l /path/to/your/css/file.cssCheck the modification time matches when you uploaded.
- Refresh your application layer:
# Grav bin/grav clear-cache # WordPress - via WP-CLI wp cache flush # Or use your caching plugin's interface - Refresh Redis (if you use it for object caching):
redis-cli FLUSHALL # Or via WordPress plugin interface - Refresh your web server layer (if using Nginx or Apache):
# Nginx sudo rm -rf /var/cache/nginx/fastcgi/* sudo nginx -s reload # Apache (Ubuntu/Debian) sudo rm -rf /var/cache/apache2/mod_cache_disk/* sudo systemctl reload apache2 - Refresh your CDN layer: Cloudflare dashboard > Purge Everything
- Perform a forced refresh of what is in your loading browser using appropriate keyboard shortcuts depending on what system you are using
- Test the page to confirm the update has come through fully
Any changes become visible because the new files have travelled from your application through each system in the pipeline. Sometimes, this may happen seamlessly without intervention, but it is best to know what to do when that is not how things proceed.
Related Reading
Performing parallel processing in Perl scripting with the Parallel::ForkManager module
In a previous post, I described how to add Perl modules in Linux Mint, while mentioning that I hoped to add another that discusses the use of the Parallel::ForkManager module. This is that second post, and I am going to keep things as simple and generic as they can be. There are other articles like one on the Perl Maven website that go into more detail.
The first thing to do is ensure that the Parallel::ForkManager module is called by your script; having the line of code presented below near the top of the file will do just that. Without this step, the script will not be able to find the required module by itself and errors will be generated.
use Parallel::ForkManager;
Then, the maximum number of threads needs to be specified. While that can be achieved using a simple variable declaration, the following line reads this from the command used to invoke the script. It even tells a forgetful user what they need to do in its own terse manner. Here $0 is the name of the script and N is the number of threads. Not all these threads will get used and processing capacity will limit how many actually are in use, which means that there is less chance of overwhelming a CPU.
my $forks = shift or die "Usage: $0 N\n";
Once the maximum number of available threads is known, the next step is to instantiate the Parallel::ForkManager object as follows to use these child processes:
my $pm = Parallel::ForkManager->new($forks);
With the Parallel::ForkManager object available, it is now possible to use it as part of a loop. A foreach loop works well, though only a single array can be used, with hashes being needed when other collections need interrogation. Two extra statements are needed, with one to start a child process and another to end it.
foreach $t (@array) {
my $pid = $pm->start and next;
<< Other code to be processed >>
$pm->finish;
}
Since there is often other processing performed by script, and it is possible to have multiple threaded loops in one, there needs to be a way of getting the parent process to wait until all the child processes have completed before moving from one step to another in the main script and that is what the following statement does. In short, it adds more control.
$pm->wait_all_children;
To close, there needs to be a comment on the advantages of parallel processing. Modern multicore processors often get used in single threaded operations, which leaves most of the capacity unused. Utilising this extra power then shortens processing times markedly. To give you an idea of what can be achieved, I had a single script taking around 2.5 minutes to complete in single threaded mode, while setting the maximum number of threads to 24 reduced this to just over half a minute while taking up 80% of the processing capacity. This was with an AMD Ryzen 7 2700X CPU with eight cores and a maximum of 16 processor threads. Surprisingly, using 16 as the maximum thread number only used half the processor capacity, so it seems to be a matter of performing one's own measurements when making these decisions.
Interrogating Solaris hardware for installed CPU and memory resources
There are times when working with a Solaris server that you need to know a little more about the hardware configuration. Knowing how much memory that you have and how many processors there are can be very useful to know if you are not to hog such resources.
The command for revealing how much memory has been installed is:
prtconf -v
Since memory is often allocated to individual CPU's, then knowing how many are on the system is a must. This command will give you the bare number:
psrinfo -p
The following variant provides the full detail that you see below it:
psrinfo -v
Output:
Status of virtual processor 0 as of: 10/06/2008 16:47:54
on-line since 09/13/2008 14:47:52.
The sparcv9 processor operates at 1503 MHz,
and has a sparcv9 floating point processor.
Status of virtual processor 1 as of: 10/06/2008 16:47:54
on-line since 09/13/2008 14:47:49.
The sparcv9 processor operates at 1503 MHz,
and has a sparcv9 floating point processor.
For a level intermediate between both extremes, try this to get what you see below it:
psrinfo -vp
Output:
The physical processor has 1 virtual processor (0)
UltraSPARC-IIIi (portid 0 impl 0x16 ver 0x34 clock 1503 MHz)
The physical processor has 1 virtual processor (1)
UltraSPARC-IIIi (portid 1 impl 0x16 ver 0x34 clock 1503 MHz)