Docker | Technology Tales

Online R programming books that are worth bookmarking

23^rd March 2026

As part of making content more useful following its reorganisation, numerous articles on the R statistical computing language have appeared on here. All of those have taken a more narrative form. With this collation of online books on the R language, I take a different approach. What you find below is a collection of links with associated descriptions. While narrative accounts can be very useful, there is something handy about running one's eye down a compilation as well. Many entries have a corresponding print edition, some of which are not cheap to buy, which makes me wonder about the economics of posting the content online as well, though it can help with getting feedback during book preparation.

Big Book of R

We start with this comprehensive collection of over 400 free and affordable resources related to the R programming language, organised into categories such as data science, statistics, machine learning and specific fields like economics and life sciences. In many ways, it is a superset of what you find below and complements this collection with many other finds. The fact that it is a living collection makes it even more useful.

R Programming for Data Science

Here is an introduction to the R programming language, focusing on its application in data science. It covers foundational topics such as installation, data manipulation, function writing, debugging and code optimisation, alongside advanced concepts like parallel computation and data analysis case studies. The text includes practical guidance on handling data structures, using packages such as {dplyr} and {readr} as well as working with dates, times and regular expressions. Additional sections address control structures, scoping rules and profiling techniques, while the author also discusses resources for staying updated through a podcast and accessing e-book versions for ongoing revisions.

Hands-On Programming with R

Designed for individuals with no prior coding experience, the book provides an introduction to programming in R while using practical examples to teach fundamental concepts such as data manipulation, function creation and the use of R's environment system. It is structured around hands-on projects, including simulations of weighted dice, playing cards and a slot machine, alongside explanations of core programming principles like objects, notation, loops and performance optimisation. Additional sections cover installation, package management, data handling and debugging techniques. While the book is written using RMarkdown and published under a Creative Commons licence, a physical edition is available through O’Reilly.

Advanced R

What you have here is one of several books written by Hadley Wickham. This one is published in its second edition as part of Chapman and Hall's R Series and is aimed primarily at R users who want to deepen their programming skills and understanding of the language, though it is also useful for programmers migrating from other languages. The book covers a broad range of topics organised into sections on foundations, functional programming, object-oriented programming, metaprogramming and techniques, with the latter including debugging, performance measurement and rewriting R code in C++.

Cookbook for R

Unlike Paul Teetor's separately published R Cookbook, the Cookbook for R was created by Winston Chang. It offers solutions to common tasks and problems in data analysis, covering topics such as basic operations, numbers, strings, formulas, data input and output, data manipulation, statistical analysis, graphs, scripts and functions, and tools for experiments.

R for Data Science

The second edition of R for Data Science by Hadley Wickham, Mine Çetinkaya-Rundel and Garrett Grolemund offers a structured approach to learning data science with R, covering essential skills such as data visualisation, transformation, import, programming and communication. Organised into chapters that explore workflows, data manipulation techniques and tools like Quarto for reproducible research, the book emphasises practical applications and best practices for handling data effectively.

R Graphics Cookbook

The R Graphics Cookbook, 2nd edition, offers a comprehensive guide to creating visualisations in R, structured into chapters that cover foundational skills such as installing and using packages, loading data from various formats and exploring datasets through basic plots. It progresses to detailed techniques for constructing bar graphs, line graphs, scatter plots and histograms, alongside methods for customising axes, annotations, themes and legends.

The book also addresses advanced topics like colour application, faceting data into subplots, generating specialised graphs such as network diagrams and heat maps and preparing data for visualisation through reshaping and summarising. Additional sections focus on refining graphical outputs for presentation, including exporting to different file formats and adjusting visual elements for clarity and aesthetics, while an appendix provides an overview of the {ggplot2} system.

R Markdown: The Definitive Guide

Published by Chapman & Hall/CRC, R Markdown: The Definitive Guide by Yihui Xie, J.J. Allaire and Garrett Grolemund covers the R Markdown document format, which has been in use since 2012 and is built on the knitr and Pandoc tools. The format allows users to embed code within Markdown documents and compile the results into a range of output formats including PDF, HTML and Word. The guide covers a broad scope of practical applications, from creating presentations, dashboards, journal articles and books to building interactive applications and generating blogs, reflecting how the ecosystem has matured since the {rmarkdown} package was first released in 2014.

A key principle running throughout is that Markdown's deliberately limited feature set is a strength rather than a drawback, encouraging authors to focus on content rather than complex typesetting. Despite this simplicity, the format remains highly customisable through tools such as Pandoc templates, LaTeX and CSS. Documents produced in R Markdown are also notably portable, as their straightforward syntax makes conversion between output formats more reliable, and because results are generated dynamically from code rather than entered manually, they are far more reproducible than those produced through conventional copy-and-paste methods.

R Markdown Cookbook

The R Markdown Cookbook is a practical guide designed to help users enhance their ability to create dynamic documents by combining analysis and reporting. It covers essential topics such as installation, document structure, formatting options and output formats like LaTeX, HTML and Word, while also addressing advanced features such as customisations, chunk options and integration with other programming languages. The book provides step-by-step solutions to common tasks, drawing on examples from online resources and community discussions to offer clear, actionable advice for both new and experienced users seeking to improve their workflow and explore the full potential of R Markdown.

RMarkdown for Scientists

This book provides a practical guide to using R Markdown for scientists, developed from a three-hour workshop and designed to evolve as a living resource. It covers essential topics such as setting up R Markdown documents, integrating with RStudio for efficient workflows, exporting outputs to formats like PDF, HTML and Word, managing figures and tables with dynamic references and captions, incorporating mathematical equations, handling bibliographies with citations and style adjustments, troubleshooting common issues and exploring advanced R Markdown extensions.

bookdown: Authoring Books and Technical Documents with R Markdown

Here is a guide to using the {bookdown} package, which extends R Markdown to facilitate the creation of books and technical documents. It covers Markdown syntax, integration of R code, formatting options for HTML, LaTeX and e-book outputs and features such as cross-referencing, custom blocks and theming. The package supports both multipage and single-document outputs, and its applications extend beyond traditional books to include course materials, manuals and other structured content. The work includes practical examples, publishing workflows and details on customisation, alongside information about licensing and the availability of a printed version.

[blogdown]: Creating Websites with R Markdown

Though the authors note that some information may be outdated due to recent updates to Hugo and the {blogdown} package, and they direct readers to additional resources for the latest features and changes, this book still provides a guide to building static websites using R Markdown and the Hugo static site generator, emphasising the advantages of this approach for creating reproducible, portable content. It covers installation, configuration, deployment options such as Netlify and GitHub Pages, migration from platforms like WordPress and advanced topics including custom layouts and version control as well as practical examples, workflow recommendations and discussions on themes, content management and technical aspects of website development.

[pagedown]: Create Paged HTML Documents for Printing from R Markdown

The R package {pagedown} enables users to create paged HTML documents suitable for printing to PDF, using R Markdown combined with a JavaScript library called paged.js, that later of which implements W3C specifications for paged media. While tools like LaTeX and Microsoft Word have traditionally dominated PDF production, pagedown offers an alternative approach through HTML and CSS, supporting a range of document types including resumes, posters, business cards, letters, theses and journal articles.

Documents can be converted to PDF via Google Chrome, Microsoft Edge or Chromium, either manually or through the chrome_print() function, with additional support for server-based, CI/CD pipeline and Docker-based workflows. The package provides customisable CSS stylesheets, a CSS overriding mechanism for adjusting fonts and page properties, and various formatting features such as lists of tables and figures, abbreviations, footnotes, line numbering, page references, cover images, running headers, chapter prefixes and page breaks. Previewing paged documents requires a local or remote web server, and the layout is sensitive to browser zoom levels, with 100% zoom recommended for the most accurate output.

Dynamic Documents with R and knitr

Developed by Yihui Xie and inspired by the earlier {Sweave} package, {knitr} is an R package designed for dynamic report generation that consolidates the functionality of numerous other add-on packages into a single, cohesive tool. It supports multiple input languages, including R, Python and shell scripts, as well as multiple output markup languages such as LaTeX, HTML, Markdown, AsciiDoc and reStructuredText. The package operates on a principle of transparency, giving users full control over how input and output are handled, and runs R code in a manner consistent with how it would behave in a standard R terminal.

Among its notable features are built-in caching, automatic code formatting via the {formatR} package, support for more than 20 graphics devices and flexible options for managing plots within documents. It also allows advanced users to define custom hooks and regular expressions to extend and tailor its behaviour further. The package is affiliated with the Foundation for Open Access Statistics, a nonprofit organisation promoting free software, open access publishing and reproducible research in statistics.

Mastering Shiny

Mastering Shiny is a comprehensive guide to developing web applications using R, focusing on the Shiny framework designed for data scientists. It introduces core concepts such as user interface design, reactive programming and dynamic content generation, while also exploring advanced topics like performance optimisation, security and modular app development. The book covers practical applications across industries, from academic teaching tools to real-time analytics dashboards, and aims to equip readers with the skills to build scalable, maintainable applications. It includes detailed chapters on workflow, layout, visualisation and user interaction, alongside case studies and technical best practices.

Engineering Production-Grade Shiny Apps

This is aimed at developers and team managers who already possess a working knowledge of the Shiny framework for R and wish to advance beyond the basics toward building robust, production-ready applications. Rather than covering introductory Shiny concepts or post-deployment concerns, the book focuses on the intermediate ground between those two stages, addressing project management, workflow, code structure and optimisation.

It introduces the {golem} package as a central framework and guides readers through a five-step workflow covering design, prototyping, building, strengthening and deployment, with additional chapters on optimisation techniques including R code performance, JavaScript integration and CSS. The book is structured to serve both those with project management responsibilities and those focused on technical development, acknowledging that in many small teams these roles are carried out by the same individual.

Outstanding User Interfaces with Shiny

Written by David Granjon and published in 2022, Outstanding User Interfaces with Shiny is a book aimed at filling the gap between beginner and advanced Shiny developers, covering how to deeply customise and enhance Shiny applications to the point where they become indistinguishable from classic web applications. The book spans a wide range of topics, including working with HTML and CSS, integrating JavaScript, building Bootstrap dashboard templates, mobile development and the use of React, providing a comprehensive resource that consolidates knowledge and experience previously scattered across the Shiny developer community.

R Packages

Now in its second edition, R Packages by Hadley Wickham and Jennifer Bryan is a freely available online guide that teaches readers how to develop packages in R. A package is the core unit of shareable and reproducible R code, typically comprising reusable functions, documentation explaining how to use them and sample data. The book guides readers through the entire process of package development, covering areas such as package structure, metadata, dependencies, testing, documentation and distribution, including how to release a package to CRAN. The authors encourage a gradual approach, noting that an imperfect first version is perfectly acceptable provided each subsequent version improves on the last.

Mastering Spark with R

Written by Javier Luraschi, Kevin Kuo and Edgar Ruiz, Mastering Spark with R is a comprehensive guide designed to take readers from little or no familiarity with Apache Spark or R through to proficiency in large-scale data science. The book covers a broad range of topics, including data analysis, modelling, pipelines, cluster management, connections, data handling, performance tuning, extensions, distributed computing, streaming and contributing to the Spark ecosystem.

Happy Git and GitHub for the useR

Here is a practical guide written by Jenny Bryan and contributors, aimed primarily at R users involved in data analysis or package development. It covers the installation and configuration of Git alongside GitHub, the development of key workflows for common tasks and the integration of these tools into day-to-day work with R and R Markdown. The guide is structured to take readers from initial setup through to more advanced daily workflows, with particular attention paid to how Git and GitHub serve the needs of data science rather than pure software development.

JavaScript for R

Written by John Coene and intended for release as part of the CRC Press R series, JavaScript for R explore how the R programming language and JavaScript can be used together to enhance data science workflows. Rather than teaching JavaScript as a standalone language, the book demonstrates how a limited working knowledge of it can meaningfully extend what R developers can achieve, particularly through the integration of external JavaScript libraries.

The book covers a broad range of topics, progressing from foundational concepts through to data visualisation using the {htmlwidgets} package, bidirectional communication with Shiny, JavaScript-powered computations via the V8 engine and Node.js and the use of modern JavaScript tools such as Vue, React and webpack alongside R. Practical examples are woven throughout, including the building of interactive visualisations, custom Shiny inputs and outputs, image classification and machine learning operations, with all accompanying code made publicly available on GitHub.

HTTP Testing in R

This guide addresses challenges faced by developers of R packages that interact with web resources, offering strategies to create reliable unit tests despite dependencies on internet connectivity, authentication and external service availability. It explores tools such as {vcr}, {webmockr}, {httptest} and {webfakes}, which enable mocking and recording HTTP requests to ensure consistent testing environments, reduce reliance on live data and improve test reliability. The text also covers advanced topics like handling errors, securing tests and ensuring compatibility with CRAN and Bioconductor, while emphasising best practices for maintaining test robustness and contributor-friendly workflows. Funded by rOpenSci and the R Consortium, the resource aims to support developers in building more resilient and maintainable R packages through structured testing approaches.

The Shiny AWS Book

The Shiny AWS Book is an online resource designed to teach data scientists how to deploy, host and maintain Shiny web applications using cloud infrastructure. Addressing a common gap in data science education, it guides readers through a range of DevOps technologies including AWS, Docker, Git, NGINX and open-source Shiny Server, covering everything from server setup and cost management to networking, security and custom configuration.

{ggplot2}: Elegant Graphics for Data Analysis

The third edition of {ggplot2}: Elegant Graphics for Data Analysis provides an in-depth exploration of the Grammar of Graphics framework, focusing on the theoretical foundations and detailed implementation of the ggplot2 package rather than offering step-by-step instructions for specific visualisations. Written by Hadley Wickham, Danielle Navarro and Thomas Lin Pedersen, the book is presented as an online work-in-progress, with content structured across sections such as layers, scales, coordinate systems and advanced programming topics. It aims to equip readers with the knowledge to customise plots according to their needs, rather than serving as a direct guide for creating predefined graphics.

YaRrr! The Pirate’s Guide to R

Written by Nathaniel D. Phillips, this is a beginner-oriented guide to learning the R programming language from the ground up, covering everything from installation and basic navigation of the RStudio environment through to more advanced topics such as data manipulation, statistical analysis and custom function writing. The guide progresses logically through foundational concepts including scalars, vectors, matrices and dataframes before moving into practical areas such as hypothesis testing, regression, ANOVA and Bayesian statistics. Visualisation is given considerable attention across dedicated chapters on plotting, while later sections address loops, debugging and managing data from a variety of file formats. Each chapter includes practical exercises to reinforce learning, and the book concludes with a solutions section for reference.

Data Visualisation: A Practical Introduction

Data Visualisation: A Practical Introduction is a forthcoming second edition from Princeton University Press, written by Kieran Healy and due for release in March 2026, which teaches readers how to explore, understand and present data using the R programming language and the {ggplot2} library. The book aims to bridge the gap between works that discuss visualisation principles without teaching the underlying tools and those that provide code recipes without explaining the reasoning behind them, instead combining both practical instruction and conceptual grounding.

Revised and updated throughout to reflect developments in R and {ggplot2}, the second edition places greater emphasis on data wrangling, introduces updated and new datasets, and substantially rewrites several chapters, particularly those covering statistical models and map-drawing. Readers are guided through building plots progressively, from basic scatter plots to complex layered graphics, with the expectation that by the end they will be able to reproduce nearly every figure in the book and understand the principles that inform each choice.

The book also addresses the growing role of large language models in coding workflows, arguing that genuine understanding of what one is doing remains essential regardless of the tools available. It is suitable for complete beginners, those with some prior R experience, and instructors looking for a course companion, and requires the installation of R, RStudio and a number of supporting packages before work can begin.

Hardening WordPress on Ubuntu and Apache: A practical layered approach

1^st March 2026

Protecting a WordPress site rarely depends on a single control. Practical hardening layers network filtering, a web application firewall (WAF), careful browser-side restrictions and sensible log-driven tuning. What follows brings together several well-tested techniques and the precise commands needed to get them working, while also calling out caveats and known changes that can catch administrators out. The focus is on Ubuntu and Apache with ModSecurity and the OWASP Core Rule Set for WordPress, but complementary measures round out a cohesive approach. These include a strict Content Security Policy, Cloudflare or Nginx rules for form spam, firewall housekeeping for UFW and Docker, targeted network blocks and automated abuse reporting with Fail2Ban. Where solutions have moved on, that is noted so you do not pursue dead ends.

The Web Application Firewall

ModSecurity and the OWASP Core Rule Set

ModSecurity remains the most widespread open-source web application firewall and has been under the custodianship of the OWASP Foundation since January 2024, having previously been stewarded by Trustwave for over a decade. It integrates closely with the OWASP Core Rule Set (CRS), which aims to shield web applications from a wide range of attacks including the OWASP Top Ten, while keeping false alerts to a minimum. There are two actively maintained engines: 2.9.x is the classic Apache module and 3.x is the newer, cross-platform variant. Whichever engine you pick, the rule set is the essential companion. One important update is worth stating at the outset: CRS 4 replaces exclusion lists with plugins, so older instructions that toggle CRS 3's exclusions no longer apply as written.

Installing ModSecurity on Ubuntu

On Ubuntu 24.04 LTS, installing the Apache module is straightforward. The universe repository ships libapache2-mod-security2 at version 2.9.7, which meets the 2.9.6 minimum required by CRS 4.x, so no third-party repository is needed. You can fetch and enable ModSecurity with the following commands:

sudo apt install libapache2-mod-security2
sudo a2enmod security2
sudo systemctl restart apache2

It is worth confirming the module is loaded before you proceed:

apache2ctl -M | grep security

The default configuration runs in detection-only mode, which does not block anything. Copy the recommended file into place and then edit it so that SecRuleEngine On replaces SecRuleEngine DetectionOnly:

sudo cp /etc/modsecurity/modsecurity.conf-recommended /etc/modsecurity/modsecurity.conf

Open /etc/modsecurity/modsecurity.conf and make the change, then restart Apache once more to apply it.

Pulling in the Core Rule Set

The next step is to pull in the latest Core Rule Set and wire it up. A typical approach is to clone the upstream repository, move the example setup into place and move the directory named rules into /etc/modsecurity:

cd
git clone https://github.com/coreruleset/coreruleset.git
cd coreruleset
sudo mv crs-setup.conf.example /etc/modsecurity/crs-setup.conf
sudo mv rules/ /etc/modsecurity/

Now adjust the Apache ModSecurity include so that the new crs-setup.conf and all files in /etc/modsecurity/rules are loaded. On Ubuntu, that is governed by /etc/apache2/mods-enabled/security2.conf. Edit this file to reference the new paths, remove any older CRS include lines that might conflict, and then run:

sudo systemctl restart apache2

On Ubuntu 26.04 (due for release in April 2026), the default installation includes a pre-existing CRS configuration at /etc/modsecurity/crs/crs-setup.conf. If this is left in place alongside your own cloned CRS, Apache will fail to start with a Found another rule with the same id error. Remove it before restarting:

sudo rm -f /etc/modsecurity/crs/crs-setup.conf

WordPress-Specific Allowances in CRS 3

WordPress tends to work far better with CRS when its application-specific allowances are enabled. With CRS 3, a variable named tx.crs_exclusions_wordpress can be set in crs-setup.conf to activate those allowances. The commented "exclusions" block in that file includes a template SecAction with ID 900130 that sets application exclusions. Uncomment it and reduce it to the single line that enables the WordPress flag:

SecAction 
 "id:900130,
  phase:1,
  nolog,
  pass,
  t:none,
  setvar:tx.crs_exclusions_wordpress=1"

Reload Apache afterwards with sudo service apache2 reload. If you are on CRS 4, do not use this older mechanism. The project has replaced exclusions with a dedicated WordPress rule exclusions plugin, so follow the CRS 4 plugin documentation instead. The WPSec guide to ModSecurity and CRS covers both the CRS 3 and CRS 4 approaches side by side if you need a reference that bridges the two versions.

Log Retention and WAF Tuning

Once the WAF is enforcing, logs become central to tuning. Retention is important for forensics as well as for understanding false positives over time, so do not settle for the default two weeks. On Ubuntu, you can extend Apache's logrotate configuration at /etc/logrotate.d/apache2 to keep weekly logs for 52 weeks, giving you a year of history to hand.

If you see Execution error – PCRE limits exceeded (-8) in the ModSecurity log, increase the following in /etc/modsecurity/modsecurity.conf to give the regular expression engine more headroom:

SecPcreMatchLimit 1000000
SecPcreMatchLimitRecursion 1000000

File uploads can generate an Access denied with code 403 (phase 2). Match of "eq 0" against "MULTIPART_UNMATCHED_BOUNDARY" required error. One remedy used in practice is to comment out the offending check around line 86 of modsecurity.conf and then reload. The built-in Theme Editor can trigger Request body no files data length is larger than the configured limit. Bumping SecRequestBodyLimit to 6000000 addresses that, again followed by a reload of Apache.

Whitelisting Rule IDs for Specific Endpoints

There are occasions where whitelisting specific rule IDs for specific WordPress endpoints is the most pragmatic way to remove false positives without weakening protection elsewhere. Creating a per-site or server-wide include works well; on Ubuntu, a common location is /etc/apache2/conf-enabled/whitelist.conf. For the Theme Editor, adding a LocationMatch block for /wp-admin/theme-editor.php that removes a small set of well-known noisy IDs can help:

<LocationMatch "/wp-admin/theme-editor.php">
  SecRuleRemoveById 300015 300016 300017 950907 950005 950006 960008 960011 960904 959006 980130
</LocationMatch>

For AJAX requests handled at /wp-admin/admin-ajax.php, the same set with 981173 added is often used. This style of targeted exclusion mirrors long-standing community advice: find the rule ID in logs, remove it only where it is truly safe to do so, and never disable ModSecurity outright. If you need help finding noisy rules, the following command (also documented by InMotion Hosting) summarises IDs, hostnames and URIs seen in errors:

grep ModSecurity /usr/local/apache/logs/error_log | grep "[id" | 
  sed -E -e 's#^.*[id "([0-9]*).*hostname "([a-z0-9-_.]*)"].*uri "(.*?)".*"#1 2 3#' | 
  cut -d" -f1 | sort -n | uniq -c | sort -n

Add a matching SecRuleRemoveById line in your include and restart Apache.

Browser-Side Controls: Content Security Policy

Beyond the WAF, browser-side controls significantly reduce the harm from injected content and cross-site scripting. A Content Security Policy (CSP) is both simple to begin and very effective when tightened. An easy starting point is a report-only header that blocks nothing but shows you what would have been stopped. Adding the following to your site lets you open the browser's developer console and watch violations scroll by as you navigate:

Content-Security-Policy-Report-Only: default-src 'self'; font-src 'self'; img-src 'self'; script-src 'self'; style-src 'self'

From there, iteratively allowlist the external origins your site legitimately uses and prefer strict matches. If a script is loaded from a CDN such as cdnjs.cloudflare.com, referencing the exact file or at least the specific directory, rather than the whole domain, reduces exposure to unrelated content hosted there. Inline code is best moved to external files. If that is not possible, hashes can allowlist specific inline blocks and nonces can authorise dynamically generated ones, though the latter must be unpredictable and unique per request. The 'unsafe-inline' escape hatch exists but undermines much of CSP's value and is best avoided.

Once the console is clean, you can add real-time reporting to a service such as URIports (their guide to building a solid CSP is also worth reading) by extending the header:

Content-Security-Policy-Report-Only: default-src 'self'; ...; report-uri https://example.uriports.com/reports/report; report-to default

Pair this with a Report-To header so that you can monitor and prioritise violations at scale. When you are satisfied, switch the key from Content-Security-Policy-Report-Only to Content-Security-Policy to enforce the policy, at which point browsers will block non-compliant content.

Server Fingerprints and Security Headers

While working on HTTPS and header hardening, it is useful to trim server fingerprints and raise other browser defences, and this Apache security headers walkthrough covers the rationale behind each directive clearly. Apache's ServerTokens directive can be set in /etc/apache2/apache.conf to mask version details. Options range from Full to Prod, with the latter sending only Server: Apache. Unsetting X-Powered-By in /etc/apache2/httpd.conf removes PHP version leakage. Adding the following headers in the same configuration file keeps responses out of hostile frames, asks browsers to block detected XSS and prevents MIME type sniffing:

X-Frame-Options SAMEORIGIN
X-XSS-Protection 1;mode=block
X-Content-Type-Options nosniff

These are not replacements for fixes in application code, but they do give the browser more to work with. If you are behind antivirus products or corporate HTTPS interception, bear in mind that these can cause certificate errors such as SEC_ERROR_UNKNOWN_ISSUER or MOZILLA_PKIX_ERROR_MITM_DETECTED in Firefox. Disabling encrypted traffic scanning in products like Avast, Bitdefender or Kaspersky, or ensuring enterprise interception certificates are correctly installed in Firefox's trust store, resolves those issues. Some errors cannot be bypassed when HSTS is used or when policies disable bypasses, which is the intended behaviour for high-value sites.

Contact Form Spam

Contact form spam is a different but common headache. Analysing access logs often reveals that many automated submissions arrive over HTTP/1.1 while legitimate traffic uses HTTP/2 with modern browser stacks, and this GridPane analysis of a real spam campaign confirms the pattern in detail. That difference gives you something to work with.

Filtering by Protocol in Cloudflare

You can block or challenge HTTP/1.x access to contact pages at the edge with Cloudflare's WAF by crafting an expression that matches both the old protocol and a target URI, while exempting major crawlers. A representative filter looks like this:

(http.request.version in {"HTTP/1.0" "HTTP/1.1" "HTTP/1.2"}
  and http.request.uri eq "/contact/"
  and not http.user_agent contains "Googlebot"
  and not http.user_agent contains "Bingbot"
  and not http.user_agent contains "DuckDuckBot"
  and not http.user_agent contains "facebot"
  and not http.user_agent contains "Slurp"
  and not http.user_agent contains "Alexa")

Set the action to block or to a managed challenge as appropriate.

Blocking Direct POST Requests Without a Valid Referrer

Another useful approach is to cut off direct POST requests to /wp-admin/admin-ajax.php and /wp-comments-post.php when the Referer does not contain your domain. In Cloudflare, this becomes:

(http.request.uri contains "/wp-admin/admin-ajax.php"
  and http.request.method eq "POST"
  and not http.referer contains "yourwebsitehere.com")
or
(http.request.uri contains "/wp-comments-post.php"
  and http.request.method eq "POST"
  and not http.referer contains "yourwebsitehere.com")

The same logic can be applied in Nginx with small site includes that set variables based on $server_protocol and $http_user_agent, then return 403 if a combination such as HTTP/1.1 on /contact/ by a non-whitelisted bot is met. It is sensible to verify with Google Search Console or similar that legitimate crawlers are not impeded once rules are live.

Complementary Mitigations Inside WordPress

Three complementary tools work well alongside the server-side measures already covered. The first is WP Armour, a free honeypot anti-spam plugin that adds a hidden field to comment forms, contact forms and registration pages using JavaScript. Because spambots cannot execute JavaScript, the field is never present in a genuine submission, and any bot that attempts to fill it is rejected silently. No CAPTCHA, API key or subscription is required, and the plugin is GDPR-compliant with no external server calls.

The second measure is entirely native to WordPress. Navigate to Settings, then Discussion and tick "Automatically close comments on articles older than X days." Spammers disproportionately target older content because it tends to be less actively monitored, so setting this to 180 days significantly reduces spam without affecting newer posts where discussion is still active. The value can be adjusted to suit the publishing cadence of the site.

The third layer is Akismet, developed by Automattic. Akismet passes each comment through its cloud-based filter and marks likely spam before it ever appears in the moderation queue. It is free for personal sites and requires an API key obtained from the Akismet website. Used alongside WP Armour, the two cover different vectors: WP Armour stops most bot submissions before they are processed at all, while Akismet catches those that reach the comment pipeline regardless of origin. Complementing both, reCAPTCHA v3 or hCaptcha (where privacy demands it) and simple "bot test" questions remain useful additions, though any solution that adds heavy database load warrants testing before large-scale deployment.

Host-Level Firewalls: UFW and Docker

Host-level firewalls remain important, particularly when Docker is in the mix. Ubuntu's UFW is convenient, but Docker's default iptables rules can bypass UFW and expose published ports to the public network even when ufw deny appears to be in place. One maintained solution uses the kernel's DOCKER-USER chain, so UFW regains control without disabling Docker's iptables management.

Appending a short block to /etc/ufw/after.rules that defines ufw-user-forward, a ufw-docker-logging-deny target and a DOCKER-USER chain, then jumps from DOCKER-USER into ufw-user-forward, allows UFW to govern forwarded traffic. Returning early for RELATED,ESTABLISHED connections, dropping invalid ones, accepting docker0-to-docker0 traffic and returning for RFC 1918 source ranges preserves internal communications. New connection attempts from public networks destined for private address ranges are logged and dropped, with a final RETURN handing off to Docker's own rules for permitted flows.

Restart UFW to activate the change:

sudo systemctl restart ufw
# or
sudo ufw reload

From that point, you can allow external access to a container's service port:

ufw route allow proto tcp from any to any port 80

Or scope to a specific container IP if needed:

ufw route allow proto tcp from any to 172.17.0.2 port 80

UDP rules follow the same pattern. If you prefer not to edit by hand, the UFW-docker helper script can install, check and manage these rules for you. It supports options to auto-detect Docker subnets, supports IPv6 by enabling ip6tables and a ULA (Unique Local Address) range in /etc/docker/daemon.json and can manage Swarm service exposure from manager nodes.

Should you instead use Firewalld, note that it provides a dynamically managed firewall with zones, a D-Bus API and runtime versus permanent configuration separation. It is the default in distributions such as RHEL, CentOS, Fedora and SUSE, and it also works with Docker's iptables backend, though the interaction model differs from UFW's.

Keeping Firewall Rules Tidy

Keeping firewall rules tidy is a small but useful habit. UFW can show verbose and numbered views of its state, as Linuxize's UFW rules guide explains in full:

sudo ufw status verbose
sudo ufw status numbered

Delete rules safely by number or by specification:

sudo ufw delete 4
sudo ufw delete allow 80/tcp

If you are scripting changes, the --force flag suppresses the interactive prompt. Take care never to remove your SSH allow rule when connected remotely, and remember that rule numbers change after deletions, so it is best to list again before removing the next one.

Logging Abusers with Fail2Ban and AbuseIPDB

Logging abusers and reporting them can reduce repeat visits. Fail2Ban watches logs for repeated failures and bans IPs by updating firewall rules for a set period. It can also report to AbuseIPDB via an action that was introduced in v0.10.0 (January 2017), which many installations have today.

Confirm that /etc/fail2ban/action.d/abuseipdb.conf exists and that your /etc/fail2ban/jail.local defines action_abuseipdb = abuseipdb. Within each jail that you want reported, add the following alongside your normal ban action, using categories that match the jail's purpose, such as SSH brute forcing:

%(action_abuseipdb)s[abuseipdb_apikey="my-api-key", abuseipdb_category="18,22"]

Reload with fail2ban-client reload and watch your AbuseIPDB reported IPs page to confirm submissions are flowing. If reports do not arrive, check /var/log/fail2ban.log for cURL errors and ensure your API key is correct, bearing in mind default API limits and throttling. Newer Fail2Ban versions (0.9.0 and above) use a persistent database, so re-reported IPs after restart are less of a concern. If you run older releases, a wrapper script can avoid duplicates by checking ban times before calling the API.

Blocking Provider Ranges

Occasionally, administrators choose to block traffic from entire provider ranges that are persistent sources of scanning or abuse. There are scripts such as the AWS-blocker tool that fetch the official AWS IPv4 and IPv6 ranges and insert iptables rules to block them all, and community posts such as this rundown of poneytelecom.eu ranges that shares specific ranges associated with problematic hosts for people who have seen repeated attacks from those networks. Measures like these are blunt instruments that can have side effects, so they warrant careful consideration and ongoing maintenance if used at all. Where possible, it is preferable to block based on behaviour, authentication failures and reputation rather than on broad ownership alone.

Final ModSecurity Notes: Chasing False Positives

Two final ModSecurity notes help when chasing false positives. First, WordPress comments and posting endpoints can trip generic SQL injection protections such as rule 300016 when text includes patterns that appear dangerous to a naive filter, a well-documented occurrence that catches many administrators out. Watching /etc/httpd/logs/modsec_audit.log or the Apache error log immediately after triggering the offending behaviour, and then scoping SecRuleRemoveById lines to the relevant WordPress locations such as /wp-comments-post.php and /wp-admin/post.php, clears real-world issues without turning off protections globally. Second, when very large responses are legitimately expected in parts of wp-admin, increasing SecResponseBodyLimit in an Apache or Nginx ModSecurity context can be more proportionate than whitelisting many checks at once. Always restart or reload Apache after changes so that your edits take effect.

Defence in Depth

Taken together, these layers complement each other well. ModSecurity with CRS gives you broad, configurable protection at the HTTP layer. CSP and security headers narrow the browser's attack surface and put guardrails in place for any client-side content issues. Targeted edge and server rules dampen automated spam without hindering real users or crawlers. Firewalls remain the bedrock, but modern container tooling means integrating UFW or Firewalld with Docker requires a small amount of extra care. Logs feed both your WAF tuning and your ban lists, and when you report abusers you contribute to a wider pool of threat intelligence. None of this removes the need to keep WordPress core, themes and plugins up to date, but it does mean the same attacks are far less likely to succeed or even to reach your application in the first place.

Launching SAS Analytics Pro on Viya with automated Docker image clean-up

28^th November 2025

For my freelancing, I have a licensed version of SAS Analytics Pro running in a Docker container on my main Linux workstation. Every time there is a new release, a new Docker is made available, which means that a few of them could accumulate on your system. Aside from taking up disk space that could have other uses, it also makes it tricky to automate the startup of the associate Docker container. Avoiding this means pruning the Docker images available on the system, something that also needs automation.

To make things clearer, let me work through the launch script that I use; this is called by the script that checks for and then downloads any new image that is available, should that be needed. First up is the shebang, and this uses the -e switch to exit the script in the event of there being an error. That puts a stop to any potentially destructive outcomes from later commands being executed afterwards and without having the input that they need.

#!/bin/bash -e

Next comes the command to shut down the existing container. Should a new image get instated, this would lock up the old one, preventing its removal. Also, doing the rest of the steps with an already running container will result in errors anyway.

if docker container ls -a --format '{{.Names}}' | grep -q '^sas-analytics-pro$'; then
    docker container stop sas-analytics-pro
fi

After that, the step to find the latest image is performed. Once, I did this by looping through the ages by days, weeks and months, hardly an elegant or robust approach. What follows is something all the more effective.

# Find latest SAS Analytics Pro image
IMAGE=$(docker image ls --format '{{.Repository}}:{{.Tag}} {{.CreatedAt}}' \
    | grep 'sas-analytics-pro' \
    | sort -k2,3r \
    | head -n 1 \
    | awk '{print $1}')

echo "Chosen image: $IMAGE"

Since there is quite a lot happening above, let us unpack the actions. The first part lists all Docker images, formatting each line to show the image name (repository:tag) followed by its creation timestamp: docker image ls --format '{{.Repository}}:{{.Tag}} {{.CreatedAt}}'. The next piece picks out all the images that are for SAS Analytics Pro: grep 'sas-analytics-pro'. The crucial step, sort -k2,3r, comes next and this sorts the results by the second and third fields (the creation date and time) in reverse order, so the newest images appear first. With that done, it is time to pick out the most recent image using head -n 1. To pick out the image name, you need awk '{print $1}. This wrapped within IMAGE=$(...) to assign the result to a variable that is printed to the console using an echo statement.

With the image selected, you can then spin up the container once you specify the other parameters to use and allow some sleep time afterwards before proceeding to the clean-up steps:

run_args="
-e SASLOCKDOWN=0
--name=sas-analytics-pro
--rm
--detach
--hostname sas-analytics-pro
--env RUN_MODE=developer
--env SASLICENSEFILE=[Path to SAS licence file]
--publish 8080:80
--volume ${PWD}/sasinside:/sasinside
--volume ${PWD}/sasdemo:/data2
--volume [location of SAS files on the system]:/data
--cap-add AUDIT_WRITE
--cap-add SYS_ADMIN
--publish 8222:22
"

if ! docker run -u root ${run_args} "$IMAGE" "$@" > /dev/null 2>&1; then
    echo "Failed to run the image."
    exit 1
fi

sleep 5

With the new container in action, the subsequent step is to find the older images and delete those. Again, the docker image command is invoked, with its output fed to a selection command for SAS Analytics Pro images. Once the current image has been removed from the listing by the grep -v command, the list of images to be deleted is assigned to the IMAGES_TO_REMOVE variable.

IMAGES_TO_REMOVE=$(docker image ls --format '{{.Repository}}:{{.Tag}}' \
    | grep 'sas-analytics-pro' \
    | grep -v "^$IMAGE$")

echo "Will remove older images:"
echo "$IMAGES_TO_REMOVE"

After that has happened, iterating through the list of images using a for loop will remove them one at a time using the docker image rm command:

for OLD in $IMAGES_TO_REMOVE; do
    echo "Removing $OLD"
    docker image rm "$OLD" || echo "Could not remove $OLD"
done

All this concludes the operation of spinning up a new SAS Analytics Pro Docker container while also removing any superseded Docker images. One last step is to capture the password to use for logging into the SAS Studio interface that is available at localhost:8080 or whatever address and port is being used to serve the application:

docker logs sas-analytics-pro 2>&1 | grep "Password=" > pw.txt

Folding updating and housekeeping into the same activity as spinning up the Docker container means that I need not think of doing anything else. The time taken by the other activities repay the effort by always having the latest version running in a tidy environment. That just saves having to remember to do all of this, which is what is needed without automation.