France | Technology Tales

From summary statistics to published reports with R, LaTeX and TinyTeX

19^th March 2026

For anyone working across LaTeX, R Markdown and data analysis in R, there comes a point where separate tools begin to converge. Data has to be summarised, those summaries have to be turned into presentable tables and the finished result has to compile into a report that looks appropriate for its audience rather than a console dump. These notes follow that sequence, moving from the practical business of summarising data in R through to tabulation and then on to the publishing infrastructure that makes clean PDF and Word output possible.

Summarising Data with {dplyr}

The starting point for many analyses is a quick exploration of the data at hand. One useful example uses the anorexia dataset from the {MASS} package together with {dplyr}. The dataset contains weight change data for young female anorexia patients, divided into three treatment groups: Cont for the control group, CBT for cognitive behavioural treatment and FT for family treatment.

The basic manipulation starts by loading {MASS} and {dplyr}, then using filter() to create separate subsets for each treatment group. From there, mutate() adds a wtDelta column defined as Postwt - Prewt, giving the weight change for each patient. group_by(Treat) prepares the data for grouped summaries, and arrange(wtDelta) sorts within treatment groups. The notes then show how {dplyr}'s pipe operator, %>%, makes the workflow more readable by chaining these operations. The final summary table uses summarize() to compute the number of observations, the mean weight change and the standard deviation within each treatment group. The reported values are count 29, average weight change 3.006897 and standard deviation 7.308504 for CBT, count 26, average weight change -0.450000 and standard deviation 7.988705 for Cont and count 17, average weight change 7.264706 and standard deviation 7.157421 for FT.

That example is not presented as a complete statistical analysis. Instead, it serves as a quick exploratory route into the data, with the wording remaining appropriately cautious and noting that this is only a glance and not a rigorous analysis.

Choosing an R Package for Descriptive Summaries

The question of how best to summarise data opens up a broader comparison of R packages for descriptive statistics. A useful review sets out a common set of needs: a count of observations, the number and types of fields, transparent handling of missing data and sensible statistics that depend on the data type. Numeric variables call for measures such as mean, median, range and standard deviation, perhaps with percentiles. Categorical variables call for counts of levels and some sense of which categories dominate.

Base R's summary() does some of this reasonably well. It distinguishes categorical from numeric variables and reports distributions or numeric summaries accordingly, while also highlighting missing values. Yet, it does not show an overall record count, lacks standard deviation and is not especially tidy or ready for tools such as kable. Several contributed packages aim to improve on that. Hmisc::describe() gives counts of variables and observations, handles both categorical and numerical data and reports missing values clearly, showing the highest and lowest five values for numeric data instead of a simple range. pastecs::stat.desc() is more focused on numeric variables and provides confidence intervals, standard errors and optional normality tests. psych::describe() includes categorical variables but converts them to numeric codes by default before describing them, which the package documentation itself advises should be interpreted cautiously. psych::describeBy() extends this approach to grouped summaries and can return a matrix form with mat = TRUE.

Among the packages reviewed, {skimr} receives especially strong attention for balancing readability and downstream usefulness. skim() reports record and variable counts clearly, separates variables by type and includes missing data and standard summaries in an accessible layout. It also works with group_by() from {dplyr}, making grouped summaries straightforward to produce. More importantly for analytical workflows, the skim output can be treated as a tidy data frame in which each combination of variable and statistic is represented in long form, meaning the results can be filtered, transformed and plotted with standard tidyverse tools such as {ggplot2}.

{summarytools} is presented as another strong option, though with a distinction between its functions. descr() handles numeric variables and can be converted to a data frame for use with kable, while dfSummary() works across entire data frames and produces an especially polished summary. At the time of the original notes, dfSummary() was considered slow. The package author subsequently traced the issue, as documented in the same review, to an excessive number of histogram breaks being generated for variables with large values, imposing a limit to resolve it. The package also supports output through view(dfSummary(data)), which yields an attractive HTML-style summary.

Grouped Summary Table Packages

Once the data has been summarised, the next step is turning those summaries into formal tables. A detailed comparison covers a number of packages specifically designed for this purpose: {arsenal}, {qwraps2}, {Amisc}, {table1}, {tangram}, {furniture}, {tableone}, {compareGroups} and {Gmisc}. {arsenal} is described as highly functional and flexible, with tableby() able to create grouped tables in only a few lines and then be customised through control objects that specify tests, display statistics, labels and missing value treatment. {qwraps2} offers a lot of flexibility through nested lists of summary specifications, though at the cost of more code. {Amisc} can produce grouped tables and works with pander::pandoc.table(), but is noted as not being on CRAN. {table1} creates attractive tables with minimal code, though its treatment of missing values may not suit every use case. {tangram} produces visually appealing HTML output and allows custom rows such as missing counts to be inserted manually, although only HTML output is supported. {furniture} and {tableone} both support grouped table creation, but {tableone} in particular is notable because it is widely used in biomedical research for baseline characteristics tables.

The {tableone} package deserves separate mention because it is designed to summarise continuous and categorical variables in one table, a common need in medical papers. As the package introduction explains, CreateTableOne() can be used on an entire dataset or on a selected subset of variables, with factorVars specifying variables that are coded numerically but should be treated as categorical. The package can display all levels for categorical variables, report missing values via summary() and switch selected continuous variables to non-normal summaries using medians and interquartile ranges instead of means and standard deviations. For grouped comparisons, it prints p-values by default and can switch to non-parametric tests or Fisher's exact test where needed. Standardised mean differences can also be shown. Output can be captured as a matrix and written to CSV for editing in Excel or Word.

Styling and Exporting Tables

With tables constructed, the focus shifts to how they are presented and exported. As Hao Zhu's conference slides explain, the {kableExtra} package builds on knitr::kable() and provides a grammar-like approach to adding styling layers, importing the pipe %>% symbol from {magrittr} so that formatting functions can be added in the same way that layers are added in {ggplot2}. It supports themes such as kable_paper, kable_classic, kable_minimal and kable_material, as well as options for striping, hover effects, condensed layouts, fixed headers, grouped rows and columns, footnotes, scroll boxes and inline plots.

Table output is often the visible end of an analysis, and a broader review of R table packages covers a range of approaches that go well beyond the default output. In R Markdown, packages such as {gt}, {kableExtra}, {formattable}, {DT}, {reactable}, {reactablefmtr} and {flextable} all offer richer possibilities. Some are aimed mainly at HTML output, others at Word. {DT} in particular supports highly customised interactive tables with searching, filtering and cell styling through more advanced R and HTML code. {flextable} is highlighted as the strongest option when knitting to Word, given that the other packages are primarily designed for HTML.

For users working in Word-heavy settings, older but still practical workflows remain relevant too. One approach is simply to write tables to comma-separated text files and then paste and convert the content into a Word table. Another route is through {arsenal}'s write2 functions, designed as an alternative to SAS ODS. The convenience functions write2word(), write2html() and write2pdf() accept a wide range of objects: tableby, modelsum, freqlist and comparedf from {arsenal} itself, as well as knitr::kable(), xtable::xtable() and pander::pander_return() output. One notable constraint is that {xtable} is incompatible with write2word(). Beyond single tables, the functions accept a list of objects so that multiple tables, headers, paragraphs and even raw HTML or LaTeX can all be combined into a single output document. A yaml() helper adds a YAML header to the output, and a code.chunk() helper embeds executable R code chunks, while the generic write2() function handles formats beyond the three convenience wrappers, such as RTF.

The Publishing Infrastructure: CTAN and Its Mirrors

Producing PDF output from R Markdown depends on a working LaTeX installation, and the backbone of that ecosystem is CTAN, the Comprehensive TeX Archive Network. CTAN is the main archive for TeX and LaTeX packages and is supported by a large collection of mirrors spread around the world. The purpose of this distributed system is straightforward: users are encouraged to fetch files from a site that is close to them in network terms, which reduces load and tends to improve speed.

That global spread is extensive. The CTAN mirror list organises sites alphabetically by continent and then by country, with active sites listed across Africa, Asia, Europe, North America, Oceania and South America. Africa includes mirrors in South Africa and Morocco. Asia has particularly wide coverage, with many mirrors in China as well as sites in Korea, Hong Kong, India, Indonesia, Japan, Singapore, Taiwan, Saudi Arabia and Thailand. Europe is especially rich in mirrors, with hosts in Denmark, Germany, Spain, France, Italy, the Netherlands, Norway, Poland, Portugal, Romania, Switzerland, Finland, Sweden, the United Kingdom, Austria, Greece, Bulgaria and Russia. North America includes Canada, Costa Rica and the United States, while Oceania covers Australia and South America includes Brazil and Chile.

The details matter because different mirrors expose different protocols. While many support HTTPS, some also offer HTTP, FTP or rsync. CTAN provides a mirror multiplexer to make the common case simpler: pointing a browser to https://mirrors.ctan.org/ results in automatic redirection to a mirror in or near the user's country. There is one caveat. The multiplexer always redirects to an HTTPS mirror, so anyone intending to use another protocol needs to select manually from the mirror list. That is why the full listings still include non-HTTPS URLs alongside secure ones.

There is also an operational side to the network that is easy to overlook when things are working well. CTAN monitors mirrors to ensure they are current, and if one falls behind, then mirrors.ctan.org will not redirect users there. Updates to the mirror list can be sent to ctan@ctan.org. The master host of CTAN is ftp.dante.de in Cologne, Germany, with rsync access available at rsync://rsync.dante.ctan.org/CTAN/ and web access on https://ctan.org/. For those who want to contribute infrastructure rather than simply use it, CTAN also invites volunteers to become mirrors.

TinyTeX: A Lightweight LaTeX Distribution

This infrastructure becomes much more tangible when looking at a lightweight TeX distribution such as TinyTeX. TinyTeX is a lightweight, cross-platform, portable and easy-to-maintain LaTeX distribution based on TeX Live. It is small in size but intended to function well in most situations, especially for R users. Its appeal lies in not requiring users to install thousands of packages they will never use, installing them as needed instead. This also means installation can be done without administrator privileges, which removes one of the more familiar barriers around traditional TeX setups. TinyTeX can even be run from a flash drive.

For R users, TinyTeX is closely tied to the {tinytex} R package. The distinction is important: tinytex in lower case refers to the R package, while TinyTeX refers to the LaTeX distribution. Installation is intentionally direct. After installing the R package with install.packages('tinytex'), a user can run tinytex::install_tinytex(). Uninstallation is equally simple with tinytex::uninstall_tinytex(). For the average R Markdown user, that is often enough. Once TinyTeX is in place, PDF compilation usually requires no further manual package management.

There is slightly more to know if the aim is to compile standalone LaTeX documents from R. The {tinytex} package provides wrappers such as pdflatex(), xelatex() and lualatex(). These functions detect required LaTeX packages that are missing and install them automatically by default. In practical terms, that means a small example document can be written to a file and compiled with tinytex::pdflatex('test.tex') without much concern about whether every dependency has already been installed. For R users, this largely removes the old pattern of cryptic missing-package errors followed by manual searching through TeX repositories.

Developers may want more than the basics, and TinyTeX has a path for that as well. A helper such as tinytex:::install_yihui_pkgs() installs a collection of packages needed for building the PDF vignettes of many CRAN packages. That is a specific convenience rather than a universal requirement, but it illustrates the design philosophy behind TinyTeX: keep the initial footprint light and offer ways to add what is commonly needed later.

Using TinyTeX Outside R

For users outside R, TinyTeX still works, but the focus shifts to the command-line utility tlmgr. The documentation is direct in its assumptions: if command-line work is unwelcome, another LaTeX distribution may be a better fit. The central command is tlmgr, and much of TinyTeX maintenance can be expressed through it.

On Linux, installation places TinyTeX in $HOME/.TinyTeX and creates symlinks for executables such as pdflatex under $HOME/bin or $HOME/.local/bin if it exists. The installation script is fetched with wget and piped to sh, after first checking that Perl is correctly installed. On macOS, TinyTeX lives in ~/Library/TinyTeX, and users without write permission to /usr/local/bin may need to change ownership of that directory before installation. Windows users can run a batch file, install-bin-windows.bat, and the default installation directory is %APPDATA%/TinyTeX unless APPDATA contains spaces or non-ASCII characters, in which case %ProgramData% is used instead. PowerShell version 3.0 or higher is required on Windows.

Uninstallation follows the same self-contained logic. On Linux and macOS, tlmgr path remove is followed by deleting the TinyTeX folder. On Windows, tlmgr path remove is followed by removing the installation directory. This simplicity is a deliberate contrast with larger LaTeX distributions, which are considerably more involved to remove cleanly.

Maintenance and Package Management

Maintenance is where TinyTeX's relationship to CTAN and TeX Live becomes especially visible. If a document fails with an error such as File 'times.sty' not found, the fix is to search for the package containing that file with tlmgr search --global --file "/times.sty". In the example given, that identifies the psnfss package, which can then be installed with tlmgr install psnfss. If the package includes executables, tlmgr path add may also be needed. An alternative route is to upload the error log to the yihui/latex-pass GitHub repository, where package searching is carried out remotely.

If the problem is less obvious, a full update cycle is suggested: tlmgr update --self --all, then tlmgr path add and fmtutil-sys --all. R users have wrappers for these tasks too, including tlmgr_search(), tlmgr_install() and tlmgr_update(). Some situations still require a full reinstallation. If TeX Live reports Remote repository newer than local, TinyTeX should be reinstalled manually, which for R users can be done with tinytex::reinstall_tinytex(). Similarly, when a TeX Live release is frozen in preparation for a new one, the advice is simply to wait and then reinstall when the next release is ready.

The motivation behind TinyTeX is laid out with unusual clarity. Traditional LaTeX distributions often present a choice between a small basic installation that soon proves incomplete and a very large full installation containing thousands of packages that will never be used. TinyTeX is framed as a way around those frustrations by building on TeX Live's portability and cross-platform design while stripping away unnecessary size and complexity. The acknowledgements also underline that TinyTeX depends on the work of the TeX Live team.

Connecting the R Workflow to a Finished Report

Taken together, these notes show how closely summarisation, tabulation and publishing are linked. {dplyr} and related tools make it easy to summarise data quickly, while a wide range of R packages then turn those summaries into tables that are not only statistically useful but also presentable. CTAN and its mirrors keep the TeX ecosystem available and current across the world, and TinyTeX builds on that ecosystem to make LaTeX more manageable, especially for R users. What begins with a grouped summary in the console can end with a polished report table in HTML, PDF or Word, and understanding the chain between those stages makes the whole workflow feel considerably less mysterious.

A survey of commenting systems for static websites

25^th February 2026

This piece grew out of a practical problem. When building a Hugo website, I went looking for a way to add reader comments. The remotely hosted options I found were either subscription-based or visually intrusive in ways that clashed with the site design. Moving to the self-hosted alternatives brought a different set of difficulties: setup proved neither straightforward nor reliably successful, and after some time I concluded that going without comments was the more sensible outcome.

That experience is, it turns out, a common one. The commenting problem for static sites has no clean solution, and the landscape of available tools is wide enough to be disorienting. What follows is a survey of what is currently out there, covering federated, hosted and self-hosted approaches, so that others facing the same decision can at least make an informed choice about where to invest their time.

Federated Options

At one end of the spectrum sit the federated solutions, which take the most principled approach to data ownership. Federated systems such as Cactus Comments stand out by building on the Matrix open standard, a decentralised protocol for real-time communication governed by the Matrix.org Foundation. Because comments exist as rooms on the Matrix network, they are not siloed within any single server, and users can engage with discussions using an existing Matrix account on any compatible home server, or follow threads using any Matrix client of their choosing. Site owners, meanwhile, retain the flexibility to rely on the public Cactus Comments service or to run their own Matrix home server, avoiding third-party tracking and centralised control alike. The web client is LGPLv3 licensed and the backend service is AGPLv3 licensed, making the entire stack free and open source.

Solutions for Publishers and Media Outlets

For publishers and media organisations, Coral by Vox Media offers a well-established and feature-rich alternative. Originally founded in 2014 as a collaboration between the Mozilla Foundation, The New York Times and The Washington Post, with funding from the Knight Foundation, it moved to Vox Media in 2019 and was released as open-source software. It provides advanced moderation tools supported by AI technology, real-time comment alerts and in-depth customisation through its GraphQL API. Its capacity to integrate with existing user authentication systems makes it a compelling choice for organisations that wish to maintain editorial control without sacrificing community engagement. Coral is currently deployed across 30 countries and in 23 languages, a breadth of adoption that reflects its standing among publishers of all sizes. The team has recently expanded the product to include a live Q&A tool alongside the core commenting experience, and the open-source codebase means that organisations with the technical resources can self-host the entire platform.

A strong alternative for publishers who handle large discussion volumes is GraphComment, a hosted platform developed by the French company Semiologic. It takes a social-network-inspired approach, offering threaded discussions with real-time updates, relevance-based sorting, a reputation-based voting system that enables the community to assist with moderation, and a proprietary Bubble Flow interface that makes individual threads indexable by search engines. All data are stored on servers based in France, which will appeal to publishers with European data-residency requirements. Its client list includes Le Monde, France Info and Les Echos, giving it considerable credibility in the media sector.

Hosted Solutions: Ease of Setup and Performance

Hosted solutions cater to those who prioritise simplicity and page performance above all else. ReplyBox exemplifies this approach, describing itself as 15 times lighter than Disqus, with a design focused on clean aesthetics and fast page loads. It supports Markdown formatting, nested replies, comment upvotes, email notifications and social login via Google, and it comes with spam filtering through Akismet. A 14-day free trial is available with no payment required, and a WordPress plugin is offered for those already on that platform.

Remarkbox takes a similarly restrained approach. Founded in 2014 by Russell Ballestrini after he moved his own blog to a static site and found existing solutions too slow or ad-laden, it is open source, carries no advertising and performs no user tracking. Readers can leave comments without creating an account, using email verification to confirm their identity, and the platform operates on a pay-what-you-can basis that keeps it accessible to smaller sites. It supports Markdown with real-time comment previews and deeply nested replies, and its developer notes that comments that are served through the platform contribute to SEO by making user-generated content indexable by search engines.

The choice between hosted and self-hosted systems often hinges on the trade-off between convenience and control. Staticman was a notable option in this space, acting as a Node.js bridge that committed comment submissions as data files directly to a GitHub or GitLab repository. However, its website is no longer accessible, and the project has been effectively abandoned since around 2020, with its maintainers publicly confirming in early 2024 that neither they nor the original author have been active on it for some time and that no volunteer has stepped forward to take it over. Those with a need for similar functionality are directed by the project's own contributors towards Cloudflare Workers-based alternatives. Utterances remains a viable option in this category, using GitHub Issues as its backend so that all comment data stays within a repository the site owner already controls. It requires some technical setup, but rewards that effort with complete data ownership and no external dependencies.

Open-Source, Self-Hosted Options

For developers who value privacy and data sovereignty above the convenience of a hosted service, open-source and self-hosted options present a natural fit. Remark42 is an actively maintained project that supports threaded comments, social login, moderation tools and Telegram or email notifications. Written in Python and backed by a SQLite database, Isso has been available since 2013 and offers a straightforward deployment with a small resource footprint, together with anonymous commenting that requires no third-party authentication. Both projects reflect a broader preference among privacy-conscious developers for keeping comment data entirely under their own roof.

The Case of Disqus

Valued for its ease of integration and its social features, Disqus remains one of the most widely recognised hosted commenting platform. However, it comes with well-documented drawbacks. Disqus operates as both a commenting service and a marketing and data company, collecting browsing data via tracking scripts and sharing it with third-party advertising partners. In 2021, the Norwegian Data Protection Authority notified Disqus of its intention to issue an administrative fine of approximately 2.5 million euros for processing user data without valid consent under the General Data Protection Regulation. However, following Disqus's response, the authority's final decision in 2024 was to issue a formal reprimand rather than impose the financial penalty. The proceedings nonetheless drew renewed attention to the privacy implications of relying on the platform. Site owners who prefer the convenience of a hosted service without those trade-offs may find more suitable alternatives in Hyvor Talk or CommentBox, both of which are designed around privacy-first principles and minimal setup.

Bridging the Gap: Talkyard and Discourse

Functioning as both a commenting system and a full community forum, Talkyard occupies an interesting position in the landscape. It can be embedded on a blog in the same manner as a traditional commenting widget, yet it also supports standalone discussion boards, making it a viable option for content creators who anticipate their audience outgrowing a simple comment section.

It also happens that Discourse operates on a similar principle but at greater scale, providing a fully featured forum platform that can be embedded as a comment section on external pages. Co-founded by Jeff Atwood (also a co-founder of Stack Overflow), Robin Ward and Sam Saffron, it is an open-source project whose server side is built on Ruby on Rails with a PostgreSQL database and Redis cache, while the client side uses Ember.js. Both Talkyard and Discourse are available as hosted services or as self-hosted installations, and both carry open-source codebases for those who wish to inspect or extend them.

Self-Hosting Discourse With Cloudflare CDN

For those who wish to take the self-hosted route, Discourse distributes an official Docker image that considerably simplifies deployment. The process begins by cloning the official repository into /var/discourse and running the bundled setup tool, which prompts for a hostname, administrator email address and SMTP credentials. A Linux server with at least 2 GB of memory is required, and a SWAP partition should be enabled on machines with only 1 GB.

Pairing a self-hosted instance with Cloudflare as a global CDN is a practical choice, as Cloudflare provides CDN acceleration, DNS management and DDoS mitigation, with a free tier that suits most community deployments. When configuring SSL, the recommended approach is to select Full mode in the Cloudflare SSL/TLS dashboard and generate an origin certificate using the RSA key type for maximum compatibility. That certificate is then placed in /var/discourse/shared/standalone/ssl/, and the relevant Cloudflare and SSL templates are introduced into Discourse's app.yml configuration file.

One important point during initial DNS setup is to leave the Cloudflare proxy status set to DNS only until the Discourse configuration is complete and verified, switching it to Proxied only afterwards to avoid redirect errors during first deployment. Email setup is among the more demanding aspects of running Discourse, as the platform depends on it for user authentication and notifications. The notification_email setting and the disable_emails option both require attention after a fresh install or a migration restore. Once configuration is finalised, running ./launcher rebuild app from the /var/discourse directory completes the build, typically within ten minutes.

Plugins can be added at any time by specifying their Git repository URLs in the hooks section of app.yml and triggering a rebuild. Discourse creates weekly backups automatically, storing them locally under /var/discourse/shared/standalone/backups, and these can be synchronised offsite via rsync or uploaded automatically to Amazon S3 if credentials are configured in the admin panel.

At a Glance

Solution	Type	Best For
Cactus Comments	Federated, open source	Privacy-centric sites
Coral	Open source, hosted or self-hosted	Publishers and newsrooms
GraphComment	Hosted	Enhanced engagement and SEO
ReplyBox	Hosted	Simple static sites
Remarkbox	Hosted, optional self-host	Speed and simplicity
Utterances	Repository-backed	Developer-owned data
Remark42	Self-hosted, open source	Privacy and control
Isso	Self-hosted, open source	Minimal footprint
Hyvor Talk	Hosted	Privacy-focused ease of use
CommentBox	Hosted	Clean design, minimal setup
Talkyard	Hosted or self-hosted	Comments and forums combined
Discourse	Hosted or self-hosted	Rich discussion communities
Disqus	Hosted	Ease of integration (privacy caveats apply)

Closing Thoughts

None of the options surveyed here is without compromise. The hosted services ask you to accept some degree of cost, design constraint or data trade-off. The self-hosted and repository-backed tools demand technical time that can outweigh the benefit for a small or personal site. The federated approach is principled but asks readers to have, or create, a Matrix account before they can participate. It is entirely reasonable to weigh all of that and, as I did, conclude that going without comments is the right call for now. The landscape does shift, and a solution that is cumbersome today may become more accessible as these projects mature. In the meantime, knowing what exists and where the friction lies is a reasonable place to start.

Security is a behaviour, not a tick-box

11^th February 2026

Cybersecurity is often discussed in terms of controls and compliance, yet most security failures begin and end with human action. A growing body of practice now places behaviour at the centre, drawing on psychology, neuroscience, history and economics to help people replace old habits with new ones. George Finney's Well Aware Security have built its entire approach around this idea, reframing awareness training as a driver of measurable outcomes rather than a box-ticking exercise, with coaches helping colleagues identify and build upon their existing strengths. It is also personal by design, using insights about how minds work to guide change one habit at a time rather than expecting wholesale transformation overnight.

This emphasis on behaviour is not a dismissal of technical skill so much as a reminder that skill alone is insufficient. Security is not a competency you either possess or lack; it is a behaviour that can be learned, reinforced and normalised. As social beings, we have always gathered for mutual protection, meaning the desire to contribute to collective security is already present in most people. Turning that impulse into daily action requires structure and patience, and it thrives when a supportive culture takes root.

Culture matters because norms are powerful. In a team where speed and convenience consistently override prudence, individuals who try to do the right thing can feel isolated. Conversely, when an organisation embraces cybersecurity at every level, a small group can create sufficient leverage to shift practices across the whole business. Research has found that organisations with below-average culture ratings are significantly more likely to experience a data breach than their peers, and controls alone cannot close that gap when behaviours are pulling in the opposite direction.

This is why advocates of habit-based security speak of changing one step at a time, celebrating progress and maintaining momentum. The same thinking underpins practical measures at home and at work, where small changes in how devices and data are managed can reduce risk materially without making technology difficult to use.

Network-Wide Blocking with Pi-hole

One concrete example of this approach is network-wide blocking of advertising and tracking domains using a DNS sinkhole. Pi-hole has become popular because it protects all devices on a network without requiring any client-side software to be installed on each one. It runs lightly on Linux, blocks content outside the browser (such as within mobile apps and smart TVs) and can optionally act as a DHCP server so that new devices are protected automatically upon joining the network.

Pi-hole's web dashboard surfaces insights into DNS queries and blocked domains, while a command-line interface and an API offer further control for those who need it. It caches DNS responses to speed up everyday browsing, supports both IPv4 and IPv6, and scales from small households to environments handling very high query volumes. The project is free and open source, sustained by donations and volunteer effort.

Choosing What to Block

Selecting what to block is a point at which behaviour and technology intersect. It is tempting to load every available blocklist in the hope of maximum protection, but as Avoid the Hack notes in its detailed guide to Pi-hole blocklists, more is not always better. Many lists draw from common sources, so stacking them can add redundancy without improving coverage and may increase false positives (instances where legitimate sites are mistakenly blocked).

The most effective approach begins by considering what you want to block and why, then balancing that against the requirements of your devices and services. Blocking every Microsoft domain, for instance, could disrupt operating system updates or break websites that rely on Azure. Likewise, blacklisting all domains belonging to a streaming or gaming platform may render apps unusable. Aggressive configurations are possible, but they work best when paired with careful allow-listing of domains essential to your services. Allow lists require ongoing upkeep as services move or change, so they are not a one-off exercise.

Recommended Blocklists

A practical starting point is the well-maintained Steven Black unified hosts file, which consolidates several reputable sources and many users find sufficient straight away. From there, curated collections help tailor coverage further. EasyList provides a widely trusted foundation for blocking advertising and integrates with browser extensions such as uBlock Origin, while its companion list EasyPrivacy can add stronger tracking protection at the cost of occasional breakage on certain sites.

Hagezi maintains a comprehensive set of DNS blocklists, including "multi" variants of different sizes and aggression levels, built from numerous sources. Selecting one of the multi variants is usually preferable to layering many individual category lists, which can reintroduce the overlap you were trying to avoid. Firebog organises its lists by risk: green entries carry a lower risk of causing breakage, while blue entries are more aggressive, giving you the option to mix and match according to your comfort level.

Some projects bundle many sources into a single combination list. OISD is one such option, with its Basic variant focusing mainly on advertisements, Full extending to malware, scams, phishing, telemetry and tracking, and a separate NSFW set covering adult content. OISD updates roughly every 24 hours and is comprehensive enough that many users would not need additional lists. The trade-off is placing a significant degree of trust in a single maintainer and limiting the ability to assign different rule sets to different device groups within Pi-hole, so it is worth weighing convenience against flexibility before committing.

The Blocklist Project organises themed lists covering advertising, tracking, malware, phishing, fraud and social media domains, and these work with both Pi-hole and AdGuard Home. The project completed a full rebuild of its underlying infrastructure, replacing an inconsistent mix of scripts with a properly tested Python pipeline, automated validation on pull requests and a cleaner build process.

Existing list URLs are unchanged, so anyone already using the project's lists need not reconfigure anything. That said, the broader principle holds regardless of which project you use: blocklists can become outdated if not actively maintained, reducing their effectiveness over time.

Using Regular Expressions

For more granular control, Pi-hole supports regular expressions to match domain patterns. Regex entries are powerful and can be applied both to block and to allow traffic, but they reward specificity. Broad patterns risk false positives that break legitimate services, so community-maintained regex recommendations are a safer starting point than writing everything from scratch. Pi-hole's own documentation explains how expressions are evaluated in detail. Used judiciously, regex rules extend what list-based blocking can achieve without turning maintenance into an ongoing burden.

Installing Pi-hole

Installation is straightforward. Pi-hole can be deployed in a Linux container or directly on a supported operating system using an automated installer that asks a handful of questions and configures everything in under ten minutes. Once running, you point clients to use it as their DNS resolver, either by setting DHCP options on your router, so devices adopt it automatically, or by updating network settings on each device individually. Pairing Pi-hole with a VPN extends ad blocking to mobile devices when away from home, so limited data plans go further by not downloading unwanted content. Day-to-day management is handled via the web interface, where you can add domains to block or allow lists, review query logs, view long-term statistics and audit entries, with privacy modes that can be tuned to your environment.

Device-Level Adjustments

Network filtering is one layer in a defence-in-depth approach, and a few small device-level changes can reduce friction without sacrificing safety. Bitdefender's Safepay, for example, is designed to isolate banking and shopping sessions within a hardened browser environment. If its prompts become intrusive, you can turn off notifications by opening the Bitdefender interface, selecting Privacy, then Safepay settings, and toggling off both Safepay notifications and the option to use a VPN with Safepay. Bookmarked sites can still auto-launch Safepay unless you also disable the automatic-opening option. Even with notifications suppressed, you can start Safepay manually from the dashboard whenever you want the additional protection.

On Windows, unwanted prompts from Microsoft Edge about setting it as the default browser can be handled without resorting to arcane steps. The Windows Club covers the full range of methods available. Dismissing the banner by clicking "Not now" several times usually prevents it from reappearing, though a browser update or reset may bring the message back. Advanced users can disable the recommendations via edge://flags, or apply a registry policy under HKEY_CURRENT_USERSoftwarePoliciesMicrosoftEdge by setting DefaultBrowserSettingEnabled to 0. In older environments such as Windows 7, a Group Policy setting exists to stop Edge checking whether it is the default browser. These changes should be made with care, particularly in managed environments where administrators enforce default application associations across the estate.

Knowing What Your Devices Reveal

Awareness also begins with understanding what your devices reveal to the wider internet. Services like WhatIsMyIP.com display your public IP address, the approximate location derived from it and your internet service provider. For most home users, a public IP address is dynamic rather than fixed, meaning it can change when a router restarts or when an ISP reallocates addresses; on mobile networks it may change more frequently still as devices move between towers and routing systems.

Such tools also provide lookups for DNS and WHOIS information, and they explain the difference between public and private addressing. Complementary checks from WhatIsMyBrowser.com summarise your browser version, whether JavaScript and cookies are enabled, and whether known trackers or ad blockers are detected. Sharing that information with support teams can make troubleshooting considerably faster, since it quickly narrows down where problems are likely to sit.

Protecting Your Accounts

Checking for Breached Credentials

Account security is another area where habits do most of the heavy lifting. Checking whether your email address appears in known data breaches via Have I Been Pwned helps you decide when to change passwords or enable stronger protections. The service, created by security researcher Troy Hunt, tracks close to a thousand breached websites and over 17.5 billion compromised accounts, and offers notifications as well as a searchable dataset. Finding your address in a breach does not mean your account has been taken over, but it does mean you should avoid reusing passwords and should enable two-factor authentication wherever it is available.

Two-Factor Authentication

Authenticator apps provide time-based codes that attackers cannot guess, even when armed with a reused password. Aegis Authenticator is a free, open-source option for Android that stores your tokens in an encrypted vault with optional biometric unlock. It offers a clean interface with multiple themes, supports icons for quick identification and allows import and export from a wide range of other apps. Backups can be automatic, and you remain in full control, since the app works entirely offline without advertisements or tracking.

For users who prefer cloud backup and multi-device synchronisation, Authy from Twilio offers a popular alternative that pairs straightforward setup with secure backup and support for using tokens across more than one device. Both approaches strengthen accounts significantly, and the choice often comes down to whether you value local control above all else or prefer the convenience of synchronisation.

Password Management

Strong, unique passwords remain essential even alongside two-factor authentication. KeePassXC is a cross-platform password manager for Windows, macOS and Linux that keeps your credentials in an encrypted database stored wherever you choose, rather than on a vendor's servers. It is free and open source under the GPLv3 licence, and its development process is publicly visible on GitHub.

The project has undergone rigorous external scrutiny. On the 17th of November 2025, KeePassXC version 2.7.9 was awarded a Security Visa by the French National Cybersecurity Agency (ANSSI) under its First-level Security Certification (CSPN) programme, with report number ANSSI-CSPN-2025/16. The certification is valid for three years and is recognised in France and by the German Federal Office for Information Security. More recent releases such as version 2.7.11 focus on bug fixes and usability improvements, including import enhancements, better password-generation feedback and refinements to browser integration. Because data are stored locally, you can place the database in a private or shared cloud folder if you wish to sync between devices, while encryption remains entirely under your control.

Secure Email with Tuta

Email is a frequent target for attackers and a common source of data leakage, so the choice of client can make a meaningful difference. Tuta provides open-source desktop applications for Linux, Windows and macOS that bring its end-to-end encrypted mail and calendar to the desktop with features that go beyond the web interface. The clients are signed so that updates can be verified independently, and Tuta publishes its public key, so users can confirm signatures themselves.

There is a particular focus on Linux, with support for major distributions including Ubuntu, Debian, Fedora, Arch Linux, openSUSE and Linux Mint. Deep operating-system integration enables conveniences such as opening files as attachments directly from context menus on Windows via MAPI, setting Tuta as the default mail handler, using the system's secret storage and applying multi-language spell-checking. Hardware key support via U2F is available across all desktop clients, and offline mode means previously indexed emails, calendars and contacts remain accessible without an internet connection.

Tuta does not support IMAP because downloading and storing messages unencrypted on devices would undermine its end-to-end encryption model. Instead, features such as import and export are built directly into the clients; paid plans including Legend and Unlimited currently include email import that encrypts messages locally before uploading them. The applications are built on Electron to maintain feature parity across platforms, and Tuta offers the desktop clients free to all users to ensure that core security benefits are not gated behind a subscription.

Bringing Culture and Tooling Together

These individual strands reinforce one another when combined. A network-wide blocker reduces exposure to malvertising and tracking while nudging everyone in a household or office towards safer defaults. Small device-level settings cut noise without removing protection, which helps people maintain good habits because security becomes less intrusive. Visibility tools demystify what the internet can see and how browsers behave, which builds confidence. Password managers and authenticator apps make strong credentials and second factors the norm rather than the exception, and a secure email client protects communications by default.

None of these steps requires perfection, and each can be introduced one at a time. The key is to focus on outcomes, think like a coach and make security personal, so that habits take root and last.

There is no single fix that will stop every attack. One approach that does help is consistent behaviour supported by thoughtful choices of software and services. Start with one change that removes friction while adding protection, then build from there. Over time, those choices shape a culture in which people feel they have a genuine role in keeping themselves and their organisations safe, and the technology they rely upon reflects that commitment.