TOPIC: INTERNET BOT
Building a sitemap in XML
24th November 2022While there are many tools that will build XML site maps, there is some satisfaction to be had in creating your own. This is despite there being a multitude of search engine optimisation plugins for content management systems like WordPress or what is built into static site generators like Hugo. Sometimes, building your own allows for added simplicity, and that is shared with recent efforts in WordPress theme development.
The sitemap XML protocol is simple enough to offer a short coding project. The basis was what Hugo generates, and I used Python to create the XML files. The only libraries that I needed were configparser
, SQLAlchemy and pandas. The first two of these allowed databases to be queried, and the last on the list was used for data processing. Otherwise, it was a case of using what is built into the Python language, like file writing and looping.
Once the scripts were ready, they could be uploaded to web servers and executed by scheduled jobs using CRON to keep things up to date. Along the way, I also uncovered a way to publicise the locations of the sitemap files to search engine bots using robots.txt. The structure of the instruction is the following:
User-agent: *
Sitemap: sitemap.xml
This means that it announces to all bots the location of the sitemap file. In my case, I always included the full URL for the XML file, and that clearly varies by website location.
The scurge of comment spam
7th March 2007My other blog is experiencing what feels like a deluge of comment spam. All that I can say is thank goodness for Askimet. And that is with visitors having to subscribe in order to post comments. It seems that a way has been found around that. I did have a spurious user with obdolbin.com as their website address and got rid of them but the flow still continues. Blogger does seem to have a way around this: entering the letters from an image to stop bots from doing their thing. Maybe we'll see WordPress doing the same?
Update: It seems that the torrent has now slowed to a trickle. Maybe getting rid of the spurious user has worked after all and it just took a while for the effect to kick in.