TOPIC: SEARCH ENGINE OPTIMIZATION
Building a sitemap in XML
24th November 2022While there are many tools that will build XML site maps, there is some satisfaction to be had in creating your own. This is despite there being a multitude of search engine optimisation plugins for content management systems like WordPress or what is built into static site generators like Hugo. Sometimes, building your own allows for added simplicity, and that is shared with recent efforts in WordPress theme development.
The sitemap XML protocol is simple enough to offer a short coding project. The basis was what Hugo generates, and I used Python to create the XML files. The only libraries that I needed were configparser
, SQLAlchemy and pandas. The first two of these allowed databases to be queried, and the last on the list was used for data processing. Otherwise, it was a case of using what is built into the Python language, like file writing and looping.
Once the scripts were ready, they could be uploaded to web servers and executed by scheduled jobs using CRON to keep things up to date. Along the way, I also uncovered a way to publicise the locations of the sitemap files to search engine bots using robots.txt. The structure of the instruction is the following:
User-agent: *
Sitemap: sitemap.xml
This means that it announces to all bots the location of the sitemap file. In my case, I always included the full URL for the XML file, and that clearly varies by website location.
The wonders of mod_rewrite
24th June 2007When I wrote about tidying dynamic URL's a little while back, I had no inkling that that would be a second part to the tale. My discovery of mod_rewrite, an Apache module that facilitates URL translation. The effect is that one URL is presented to the user in the browser address bar, and the very same URL is also seen by search engines, while another is passed to the server for processing. Though it might sound like subterfuge, it works very well once you manage to get it set up properly. While the web host for my hillwalking blog/photo gallery has everything configured such that it is ready to go, the same did not apply to the offline Apache 2.2.x server that I have going on my own Windows XP box. There were two parts to getting it working there:
- Activating mod-rewrite on the server: this is as easy as uncommenting a line in the
httpd.conf
file for the site (the line in question is:LoadModule rewrite_module modules/mod_rewrite.so
). - Ensuring that the
.htaccess
file in the root of the web server directory is active. You need to set the values of theAllowOverride
directives for the server root and CGI directories toAll
so that.htaccess
is active. Not doing it for the latter will result in an error beginning with the following:Options FollowSymLinks or SymLinksIfOwnerMatch is off which implies that
. HavingRewriteRule
directive is forbiddenAllow from All
set for the required directories is another option to consider when you see errors like that.
Once you have got the above sorted, add this line to .htaccess
: RewriteEngine On
. Preceding it with an Options
directive to ensure that FollowSymLinks
and SymLinksIfOwnerMatch
are switched on does no harm at all and may even be needed to get things running. That done, you can set about putting mod_write to work with lines like this:
RewriteRule ^pages/(.*)/?$ pages.php?query=$1
The effect of this is to take http://www.website.com/pages/input
and convert it into a form for action by the server; in this case, that is http://www.website.com/pages.php?query=input
. Anything contained by a bracket is assigned to the value of a system-named variable. If you have several bracketed sections, they are assigned to sequentially numbered variables as follows: $1
for the first, $2
for the second and so on. It's all good stuff when you get it going, and not only does it make things look much neater, but it also possesses an advantage when it comes to future-proofing too. Web addresses can be kept constant over time, even if things change behind the scenes. It means that any returning visitors will find what they saw the last time that they visited and surely must ensure good karma in the eyes of those all important search engines.