TOPIC: GITHUB
Open Source Tools for Pharmaceutical Clinical Data Reporting, Analysis & Regulatory Submissions
There was a time when SAS was the predominant technology for clinical data reporting, analysis and submission work in the pharmaceutical industry. Within the last decade, open-source alternatives have gained a lot of traction, and the {pharmaverse} initiative has arisen from this. The range of packages ranges from dataset creation (SDTM and ADaM) to output production, with utilities for test data and submission activities. All in all, there is quite a range here. The effort also is a marked change from each company working by itself to sharing and collaborating with others. Here then is the outcome of their endeavours.
{admiral}
Designed as an open-source, modular R toolbox, the {admiral} package assists in the creation of ADaM datasets through reusable functions and utilities tailored for pharmaceutical data analysis. Core packages handle general ADaM derivations whilst therapeutic area-specific extensions address more specialised needs, with a structured release schedule divided into two phases. Usability, simplicity and readability are central priorities, supported by comprehensive documentation, vignettes and example scripts. Community contributions and collaboration are actively encouraged, with the aim of fostering a shared, industry-wide approach to ADaM development in R. Related packages for test data and metadata manipulation complement the main toolkit, alongside a commitment to consistent coding practices and accessible code.
{aNCA}
Maintained by contributors from F. Hoffmann-La Roche AG, {aNCA} is an open-source R Shiny application that makes Non-Compartmental Analysis (NCA) accessible to scientists working with clinical and pre-clinical pharmacokinetic datasets. Users can upload their own data, apply pre-processing filters and run NCA with configurable options including half-life calculation rules, manual slope selection and user-defined AUC intervals. Results are explorable through interactive box plots, scatter plots and summary statistics tables, and can be exported in `PP` and `ADPP` dataset domains alongside a reproducible R script. Analysis settings can be saved and reloaded for continuity across sessions. Installation is available from CRAN via a standard install command, from GitHub using the `pak` package manager, or by cloning the repository directly for those wishing to contribute.
{autoslider.core}
The {autoslider.core} package generates standard table templates commonly used in Study Results Endorsement Plans. Its principal purpose is to reduce duplicated effort between statisticians and programmers when creating slides. Available on CRAN, the package can be installed either through the standard installation method or directly from GitHub for the latest development version.
{cards}
Supporting the CDISC Analysis Results Standard, the {cards} package facilitates the creation of analysis results data sets that enhance automation, reproducibility and consistency in clinical research. Structured data sets for statistical summaries are generated to enable tasks such as quality control, pre-calculating statistics for reports and combining results across studies. Tools for creating, modifying and analysing these data sets are provided, with the {cardx} extension offering additional functions for statistical tests and models. Installation is available through CRAN or GitHub, with resources including documentation and community contributions.
{cardx}
Extending the {cards} package, {cardx} facilitates the creation of Analysis Results Data Objects (ARD's) in R by leveraging utility functions from {cards} and statistical methods from packages such as {stats} and {emmeans}. These ARD's enable the generation of tables and visualisations for regulatory submissions, support quality control checks by storing both results and parameters, and allow for reproducible analyses through the inclusion of function inputs. Installation options include CRAN and GitHub, with examples demonstrating its use in t-tests and regression models. External statistical library dependencies are not enforced by the package, requiring explicit references in code for tools like {renv} to track them.
{chevron}
A collection of high-level functions for generating standardised outputs in clinical trials reporting, {chevron} covers a broad range of output types including tables for safety summaries, adverse events, demographics, ECG results, laboratory findings, medical history, response data, time-to-event analyses and vital signs, as well as listings and graphs such as Kaplan-Meier and mean plots. Straightforward implementation with limited parameterisation is a defining characteristic of the package. It is available on CRAN, with a development version accessible via GitHub, and those requiring greater flexibility are directed to the related {tern} package and its associated catalogue.
{clinify}
Built on the {flextable} and {officer} packages, {clinify} streamlines the creation of clinical tables, listings and figures whilst addressing challenges such as adherence to organisational reporting standards, the need for flexibility across different clients and the importance of reusable configurations. Compatibility with existing tools is a key priority, ensuring that its features do not interfere with the core functionalities of {flextable} or {officer}, whilst enabling tasks like dynamic page breaks, grouped headers and customisable formatting. Complex documents such as Word files with consistent layouts and tailored elements like footnotes and titles can be produced with reduced effort by building on these established frameworks.
{connector}
Offering a unified interface for establishing connections to various data sources, the {connector} package covers file systems and databases through a central configuration file that maintains consistent references across project scripts and facilitates switching between data sources. Functions such as connector_fs() for file system access and connector_dbi() for database connections are provided, with additional expansion packages enabling integration with specific platforms like Databricks and SharePoint. Installation is available via CRAN or GitHub, and usage involves defining a YAML configuration file to specify connection details that can then be initialised and utilised to interact with data sources. Operations including reading, writing and listing content are supported, with methods for managing connections and handling data in formats like parquet.
{covtracer}
Linking test traces to package code and documentation using coverage data from {covr}, the {covtracer} package enables the creation of a traceability matrix that maps tests to specific documented functions. Installation is via remotes from GitHub with specific dependencies, and configuration of {covr} is required to record tests alongside coverage traces. Untested behaviours can be identified and the direct testing of functions assessed, providing insights into test coverage and software validation. The example workflow demonstrates generating a matrix to show which tests evaluate code related to documented behaviours, highlighting gaps in test coverage.
{datacutr}
An open-source solution for applying data cuts to SDTM datasets within R, the {datacutr} package is designed to support pharmaceutical data analysis workflows. Available via CRAN or GitHub, it offers options for different types of cuts tailored to specific SDTM domains. Supplemental qualifiers are assumed to be merged with their parent domain before processing, allowing users flexibility in defining cut types such as patient, date, or domain-specific cuts. Documentation, contribution guidelines and community support through platforms like Slack and GitHub provide further assistance.
{datasetjson}
Facilitating the creation and manipulation of CDISC Dataset JSON formatted datasets, the {datasetjson} R package enables users to generate structured data files by applying metadata attributes to data frames. Metadata such as file paths, study identifiers and system details can be incorporated into dataset objects and written to disk or returned as JSON text. Reading JSON files back into data frames is also supported, with metadata preserved as attributes for use in analysis. The package currently supports version 1.1.0 of the Dataset JSON standard and is available via CRAN or GitHub.
{dataviewR}
An interactive data viewer for R, {dataviewR} enhances data exploration through a Shiny-based interface that enables users to examine data frames and tibbles with tools for filtering, column selection and generating reproducible {dplyr} code. Viewing multiple datasets simultaneously is supported, and the tool provides metadata insights alongside features for importing and exporting data, all within a responsive and user-friendly design. By combining intuitive navigation with automated code generation, the package aims to streamline data analysis workflows and improve the efficiency of dataset manipulation and documentation.
{docorator}
Generating formatted documents by adding headers, footers and page numbers to displays such as tables and figures, {docorator} exports outputs as PDF or RTF files. Accepted inputs include tables created with the {gt} package, figures generated using {ggplot2}, or paths to existing PNG files, and users can customise document elements like titles and footers. The package can be installed from CRAN or via GitHub, and its use involves creating a display object with specified formatting options before rendering the output. LaTeX libraries are required for PDF generation.
{envsetup}
Providing a configuration system for managing R project environments, the {envsetup} package enables adaptation to different deployment stages such as development, testing and production without altering code. YAML files are used to define paths for data and output directories, and R scripts are automatically sourced from specified locations to reduce the need for manual configuration changes. This approach supports consistent code usage across environments whilst allowing flexibility in environment-specific settings, streamlining workflows for projects requiring multiple deployment contexts.
{ggsurvfit}
Simplifying the creation of survival analysis visualisations using {ggplot2}, the {ggsurvfit} package offers tools to generate publication-ready figures with features such as confidence intervals, risk tables and quantile markers. Seamless integration with {ggplot2} functions allows for extensive customisation of plot elements whilst maintaining alignment between graphical components and annotations. Competing risks analysis is supported through `ggcuminc()`, and specific functions such as Surv_CNSR() handle CDISC ADaM `ADTTE` data by adjusting event coding conventions to prevent errors. Installation options are available via CRAN or GitHub, with examples and further resources accessible through its documentation and community links.
{gridify}
Addressing challenges in creating consistent and customisable graphical arrangements for figures and tables, the {gridify} package leverages the base {grid} package to facilitate the addition of headers, footers, captions and other contextual elements through predefined or custom layouts. Multiple input types are supported, including {ggplot2}, {flextable} and base R plots, and the workflow involves generating an object, selecting a layout and using functions to populate text elements before rendering the final output. Installation options include CRAN and GitHub, with examples demonstrating its application in enhancing tables with metadata and formatting. Uniformity across different projects is promoted, reducing manual adjustments and aligning visual elements consistently.
{gtsummary}
Offering a streamlined approach to generating publication-quality analytical and summary tables in R, the {gtsummary} package enables users to summarise datasets, regression models and other statistical outputs with minimal code. Variable types are identified automatically, relevant descriptive statistics computed and measures of data incompleteness included, whilst customisation of table formatting such as adjusting labels, adding p-values or merging tables for comparative analysis is also supported. Integration with packages like {broom} and {gt} facilitates the creation of visually appealing tables, and results can be exported to multiple formats including HTML, Word and LaTeX, making the package suitable for reproducible reporting in academic and professional contexts.
{logrx}
Supporting logging in clinical programming environments, the {logrx} package generates detailed logs for R scripts, ensuring code execution is traceable and reproducible. An overview of script execution and the associated environment is provided, enabling users to recreate conditions for verification or further analysis. Available on CRAN, installation is possible via standard methods or from its development repository, offering flexibility for both file-based and scripted usage. Structured logging tailored to the specific requirements of clinical applications is the defining characteristic of the package, with simplicity and minimal intrusion in coding workflows maintained throughout.
{metacore}
Providing a standardised framework for managing metadata within R sessions, the {metacore} package is particularly suited to clinical trial data analysis. Metadata is organised into six interconnected tables covering dataset specifications, variable details, value definitions, derivations, code lists and supplemental information, ensuring consistency and ease of access. By centralising metadata in a structured, immutable format, the package facilitates the development of tools that can leverage this information across different workflows, reducing the need for redundant data structures. Reading metadata from various sources, including Define-XML 2.0, is also supported.
{metatools}
Working with {metacore} objects, {metatools} enables users to build datasets, enhance columns in existing datasets and validate data against metadata specifications. Installation is available from CRAN or via GitHub. Core functionality includes pulling columns from existing datasets, creating new categorical variables, converting columns to factors and running checks to verify that data conforms to control terminology and that all expected variables are present.
{pharmaRTF}
Developed to address gaps in RTF output capabilities within R, {pharmaRTF} is a package for pharmaceutical industry programmers who produce RTF documents for clinical trial data analysis. Whilst the {huxtable} package offers extensive RTF styling and formatting options, it lacks the ability to set document properties such as page size and orientation, repeat column headers across pages, or create multi-level titles and footnotes within document headers and footers. These limitations are resolved by {pharmaRTF}, which wraps around {huxtable} tables to provide document property controls, proper multipage display and title and footnote management within headers and footers. Two core objects form the basis of the package: rtf_doc for document-wide attributes and hf_line for creating individual title and footnote lines, each carrying formatting properties such as alignment, font and bold or italic styling. Default output files use Courier New at 12-point size, Letter page dimensions in landscape orientation with one-inch margins, though all of these can be adjusted through property functions. The package is available on CRAN and supports both a {tidyverse} piping style and a more traditional assignment-based coding approach.
{pharmaverseadam}
Serving as a repository for ADaM test datasets generated by executing templates from related packages such as {admiral} and its extensions, the {pharmaverseadam} package automates dataset creation through a script that installs required packages, runs templates and saves results. Metadata is managed centrally in an XLSX file to ensure consistency in documentation, and updates occur regularly or ad-hoc when templates change. Documentation is generated automatically from metadata and saved as `.R` files, and the package includes contributions from multiple developers with examples provided for each dataset. Preparing metadata, updating configuration files for new therapeutic areas and executing a script to generate datasets and documentation ensures alignment with the latest versions of dependent packages. Installation is available via CRAN or GitHub.
{pharmaverseraw}
Providing raw datasets to support the creation of SDTM datasets, the {pharmaverseraw} package includes examples that are independent of specific electronic data capture systems or data standards such as CDASH. Datasets are named using SDTM domain identifiers with the suffix _raw, and installation options include CRAN or direct GitHub access. Updates involve contributing via GitHub issues, generating new or modified datasets through standalone R scripts stored in the data-raw folder, and ensuring generated files are saved in the data folder as .rda files with consistent naming. Documentation is maintained in R/*.R files, and changes require updating `NAMESPACE` and `.Rd` files using devtools::document.
{pharmaversesdtm}
A collection of test datasets formatted according to the SDTM standard, the {pharmaversesdtm} package is designed for use within the pharmaverse family of packages. Datasets applicable across therapeutic areas, such as `DM` and VS, are included alongside those specific to particular areas, like `RS` and `OE`. Available via CRAN and GitHub, the package provides installation instructions for both stable and development versions, with test data sourced from the CDISC pilot project and ad-hoc datasets generated by the {admiral} team. Naming conventions distinguish between general and therapeutic area-specific categories, with examples such as dm for general use and rs_onco for oncology-specific data. Updates involve creating or modifying R scripts in the data-raw folder, generating `.rda` files and updating metadata in a central JSON file to automate documentation and maintain consistency, including specifying dataset details like labels, descriptions and therapeutic areas.
{pkglite} (R)
Converting R package source code into text files and reconstructing package structures from those files, {pkglite} enables the exchange and management of R packages as plain text. Single or multiple packages can be processed through functions that collate, pack and unpack files, with installation options available via CRAN or GitHub. The tool adheres to a defined format for text files and includes documentation for generating specifications and managing file collections.
{pkglite} (Python)
An open-source framework licensed under the MIT licence, {pkglite} for Python, allows source projects written in any programming language to be packed into portable files and restored to their original directory structure. Installation is available via PyPI or as a development version cloned from GitHub, and the package can also be run without installation using `uvx`. A command line interface is provided in addition to the Python API, which can be installed globally using `pipx`.
{rhino}
Streamlining the development of high-quality, enterprise-grade Shiny applications, {rhino} integrates software engineering best practices, modular code structures and robust testing frameworks. Scalable architecture is supported through modularisation, code quality is enhanced with unit and end-to-end testing, and automation is facilitated via tools for project setup, continuous integration and dependency management. Comprehensive documentation is divided into tutorials, explanations and guides, with examples and resources available for learning.
{risk.assessr}
Evaluating the reliability and security of R packages during validation, the {risk.assessr} package analyses maintenance, documentation and dependencies through metrics such as R CMD check results, unit test coverage and dependency assessments. A traceability matrix linking functions to tests is generated, and risk profiles are based on predefined thresholds including documentation completeness, licence type and code coverage. The tool supports installation from GitHub or CRAN, processes local package files or `renv.lock` dependencies and offers detailed outputs such as risk analysis, dependency lists and reverse dependency information. Advanced features include identifying potential issues in suggested package dependencies and generating HTML reports for risk evaluation, with applications in clinical trial workflows and package validation processes.
{riskassessment}
Built on the {riskmetric} framework, the {riskassessment} application offers a user-friendly interface for evaluating the risk of using R packages within regulated industries, assessing development practices, documentation and sustainability. Non-technical users can review {riskmetric} outputs, add personalised comments, categorise packages into risk levels, generate reports and store assessments securely, with features such as user authentication and role-based access. Alignment with validation principles outlined by the R Validation Hub supports decision-making in regulated settings, though deeper software inspection may be required in some cases. Deployment is possible using tools like Shiny Server or Posit Connect, with installation options including GitHub and local configuration via {renv}.
{riskmetric}
Providing a framework for evaluating the quality of R packages, the {riskmetric} package assesses development practices, documentation, community engagement and sustainability through a series of metrics. Currently operating in a maintenance-only phase, further development is focused on a new tool called {val.metre}. The workflow involves retrieving package information, assessing it against predefined criteria and generating a risk score, with installation available from CRAN or GitHub. An associated application, {riskassessment}, offers a user interface for organisations to review and manage package risk assessments, store metrics and apply organisational rules.
{rlistings}
Designed to create and display formatted listings with a focus on ASCII rendering for tables and regulatory-ready outputs, the {rlistings} R package relies on the {formatters} package for formatting infrastructure. Requirements such as flexible pagination, multiple output formats and repeated key columns informed its development. Available on CRAN and GitHub, the package is under active development and includes features such as adjustable column widths, alignment and support for titles and footnotes.
{rtables}
Tailored for generating submission-ready tables for health authority review, the {rtables} R package creates and displays complex tables with advanced formatting and output options that support regulatory requirements for clinical trial data presentation. Separation of data values from their visualisation is enabled, multiple values can be included within cells, and flexible tabulation and formatting capabilities are provided, including cell spans, rounding and alignment. Output formats include HTML, ASCII, LaTeX, PDF and PowerPoint, with additional formats under development. Also, the package incorporates features such as pagination, distinction between data names and labels for CDISC standards and support for titles and footnotes. Installation is available via CRAN or GitHub, with ongoing community support and training resources.
{rtflite}
A lightweight Python library focused on precise formatting of production-quality tables and figures, {rtflite} is designed for composing RTF documents. Installation is available via PyPI or directly from its GitHub repository, with optional dependencies available to enable DOCX assembly support and RTF-to-PDF or RTF-to-DOCX conversion via LibreOffice.
{sdtm.oak}
Offering a modular, open-source solution for generating CDISC SDTM datasets, the {sdtm.oak} R package is designed to work across different electronic data capture systems and data standards. Industry challenges related to inconsistent raw data structures and varying data collection practices are addressed through reusable algorithms that map raw datasets to SDTM domains, with current capabilities covering Findings, Events and Intervention classes. Future developments aim to expand domain support, introduce metadata-driven code generation and enhance automation potential, though sponsor-specific metadata management tasks are not yet handled by the package. Available on CRAN and GitHub, development is ongoing with refinements based on user feedback and evolving SDTM requirements.
{sdtmchecks}
Providing functions to detect common data issues in SDTM datasets, the {sdtmchecks} package is designed to be broadly applicable and useful for analysis. Installation is available from CRAN or via GitHub, with development versions accessible through specific repositories, and users are not required to specify SDTM versions. A range of data check functions stored as R scripts is included, and contributions are encouraged that maintain flexibility across different data standards.
{siera}
Facilitating the generation of Analysis Results Datasets (ARD's) by processing Analysis Results Standard (ARS) metadata, the {siera} package works with parameters such as analysis sets, groupings, data subsets and methods. Metadata is typically provided in JSON format and used to create R scripts automatically that, when executed with corresponding ADaM datasets, produce ARD's in a structured format. The package can be installed from CRAN or GitHub, and its primary function, `readARS`, requires an ARS file, an output directory and access to relevant ADaM data. The CDISC Analysis Results Standard underpins this process, promoting automation and consistency in analysis outcomes.
{teal}
An open-source, Shiny-based interactive framework for exploratory data analysis, {teal} is developed as part of the pharmaverse ecosystem and maintained by F. Hoffmann-La Roche AG alongside a broad community of contributors. Analytical applications are built by combining supported data types, including CDISC clinical trial data, independent or relational datasets and `MultiAssayExperiment` objects, with modular analytical components known as teal modules. These modules can be drawn from dedicated packages covering general data exploration, clinical reporting and multi-omics analysis and define the specific analyses presented within an application. A suite of companion packages handles logging, reproducibility, data loading, filtering, reporting and transformation. The package is available on CRAN and is under active development, with community support provided through the {pharmaverse} Slack workspace.
{tern}
Supporting clinical trial reporting through a broad range of analysis functions, the {tern} R package offers data visualisation capabilities including line plots, Kaplan-Meier plots, forest plots, waterfall plots and Bland-Altman plots. Statistical model fit summaries for logistic and Cox regression are also provided, along with numerous analysis and summary table functions. Many of these outputs can be integrated into interactive Teal Shiny applications via the {teal.modules.clinical} package.
{tfrmt}
Offering a structured approach to defining and applying formatting rules for data displays in clinical trials, the {tfrmt} package streamlines the creation of mock displays, aligns with industry-standard Analysis Results Data (ARD) formats and integrates formatting tasks into the programming workflow to reduce manual effort and rework. Metadata is leveraged to automate styling and layout, enabling standardised formatting with minimal code, supporting quality control before final output and facilitating the reuse of datasets across different table types. Built on the {gt} package, the tool provides a flexible interface for generating tables and mock-ups, allowing users to focus on data interpretation rather than repetitive formatting tasks.
{tfrmtbuilder}
A tool for defining display-related metadata to streamline the creation and modification of table formats, the {tfrmtbuilder} package supports workflows such as generating tables from scratch, using templates or editing existing ones. Features include a toggle to switch between mock and real data, options to load or create datasets, tools for mapping and formatting data and the ability to export results as JSON, HTML or PNG. Designed for use in study planning and analysis phases, the package allows users to manage table structures efficiently.
{tidyCDISC}
An open-source R Shiny application, {tidyCDISC} is designed to help clinical personnel explore and analyse ADaM-standard data sets without writing any code. Customised clinical tables can be generated through a point-and-click interface, trends across patient populations examined using dynamic figures and individual patient profiles explored in detail. A broad range of users is served, from clinical heads with no programming background to statisticians and statistical programmers, with reported time savings of around 95% for routine trial analysis tasks. The app accepts only `sas7bdat` files conforming to CDISC ADaM standards and includes a feature to export reproducible R scripts from its table generator. A demo version is available without installation using CDISC pilot data, whilst uploading study data requires installing the package from CRAN or via GitHub.
{tidytlg}
Facilitating the creation of tables, listings and graphs using the {tidyverse} framework, the {tidytlg} package offers two approaches: a functional method involving custom scripts for each output and a metadata-driven method that leverages column and table metadata to generate results automatically. Tools for data analysis, including frequency tables and univariate statistics, are included alongside support for exporting outputs to formatted documents.
{Tplyr}
Simplifying the creation of clinical data summaries by breaking down complex tables into reusable layers, {Tplyr} allows users to focus on presentation rather than repetitive data processing. The conceptual approach of {dplyr} is mirrored but applied to common clinical table types, such as counting event-based variables, generating descriptive statistics for continuous data and categorising numerical ranges. Metadata is included with each summary produced to ensure traceability from raw data to final output, and user-acceptance testing documentation is provided to support its use in regulated environments. Installation options are available via CRAN or GitHub, accompanied by detailed vignettes covering features like layer templates, metadata extension and styled table outputs.
{valtools}
Streamlining the validation of R packages used in clinical research and drug development, {valtools} offers templates and functions to support tasks such as setting up validation frameworks, managing requirements and test cases and generating reports. Developed by the R Package Validation Framework PHUSE Working Group, the package integrates with standard development tools and provides functions prefixed with `vt` to facilitate structured validation processes including infrastructure setup, documentation creation and automated checks. Generating validation reports, scraping metadata from validation configurations and executing validation workflows through temporary installations or existing packages are all supported.
{whirl}
Facilitating the execution of scripts in batch mode whilst generating detailed logs that meet regulatory requirements, the {whirl} package produces logs including script status, execution timestamps, environment details, package versions and environmental variables, presented in a structured HTML format. Individual or multiple scripts can be run simultaneously, with parallel processing enabled through specified worker counts. A configuration file allows scripts to be executed in sequential steps, ensuring dependencies are respected, and the package produces individual logs for each script alongside a summary log and a tibble summarising execution outcomes. Installation options include CRAN and GitHub, with documentation available for customisation and advanced usage.
{xportr}
Assisting clinical programmers in preparing CDISC compliant XPT files for clinical data sets, the {xportr} package associates metadata with R data frames, performs validation checks and converts data into transportable SAS v5 XPT format. Tools are included to define variable types, set appropriate lengths, apply labels, format data, reorder variables and assign dataset labels, ensuring adherence to standards such as variable naming conventions, character length limits and the absence of non-ASCII characters. A practical example demonstrates how to use a specification file to apply these transformations to an ADSL dataset, ultimately generating a compliant XPT file.
Some R packages to explore as you find your feet with the language
Here are some commonly used R packages and other tools that are pervasive, along with others that I have encountered while getting started with the language, itself becoming pervasive in my line of business. The collection grew organically as my explorations proceeded, and reflects what I was trying out during my acclimatisation.
General
Here are two general packages to get things started, with one of them being unavoidable in the R world. The other is more advanced, possibly offering more to package developers.
You cannot use R without knowing about this collection of packages. In many ways, they form a mini-language of their own, drawing some criticism from those who reckon that base R functionality covers a sufficient gamut anyway. Nevertheless, there is so much here that will get you going with data wrangling and visualisation that it is worth knowing what is possible. The complaints may come from your not needing to use anything else for these purposes.
This R package enables developers to convert existing R functions into web API endpoints by adding roxygen2-like comment annotations to their code. Once annotated, functions can handle HTTP GET and POST requests, accept query string or JSON parameters and return outputs such as plain values or rendered plots. The package is available on CRAN as a stable release, with a development version hosted on GitHub. For deployment, it integrates with DigitalOcean through a companion package called {plumberDeploy}, and also supports Posit Connect, PM2 and Docker as hosting options. Related projects in the same space include OpenCPU, which is designed for hosting R APIs in scientific research contexts, and the now-discontinued jug package, which took a more programmatic approach to API construction.
Data Preparation
You simply cannot avoid working with data during any analysis or reporting work. While there is a learning curve if you are used to other languages, there is little doubt that R is well-endowed when it comes to performing these tasks. Here are some packages that extend base R capabilities and might even add some extra user-friendliness along the way.
The {forcats} package in R provides functions to manage categorical variables by reordering factor levels, collapsing infrequent values and adjusting their sequence based on frequency or other variables. It includes tools such as reordering by another variable, grouping rare categories into 'other' and modifying level order manually, which are useful for data analysis and visualisation workflows. Designed as part of the tidyverse, it integrates with other packages to streamline tasks like counting and plotting categorical data, enhancing clarity and efficiency in handling factors within R.
Around this time last year, I remember completing a LinkedIn course on a set of good practices known as tidy data, where each variable occupies a column, each observation a row and each value a single cell. This package is designed to help users restructure data so it follows those rules. It provides tools for reshaping data between long and wide formats, handling nested lists, splitting or combining columns, managing missing values and layering or flattening grouped data.
Installation options include the {tidyverse} collection, standalone installation, or the development version from GitHub. The package succeeds earlier reshaping tools like {reshape2} and {reshape}, offering a focused approach to tidying data rather than general reshaping or aggregation.
Having a long track record of working with SAS, {haven} with its abilities to read and write data files from statistical software such as SAS, SPSS and Stata, leveraging the ReadStat library, arouses my interest. Handily, it supports a range of file formats, including SAS transport and data files, SPSS system and older portable files and Stata data files up to version 15, converting these into tibbles with enhanced printing capabilities. Value labels are preserved as a labelled class, allowing conversion to factors, while dates and times are transformed into standard R classes.
While there are other approaches to working with databases using R, {RMariaDB} provides a database interface and driver for MariaDB, designed to fully comply with the DBI specification and serve as a replacement for the older {RMySQL} package. It supports connecting to databases using configuration files, executing queries, reading and writing data tables and managing results in chunks. Installation options include binary packages from CRAN or development versions from GitHub, with additional dependencies such as MariaDB Connector/C or libmysqlclient required for Linux and macOS systems. Configuration is typically handled through a MariaDB-specific file, and the package includes acknowledgments for contributions from various developers and organisations.
For many people, the pandemic may be a fading memory, yet it offered its chances for learning R, not least because there was a use case with more than a hint of personal interest about it. Here is a library making it easier to get hold of the data, with some added pre-processing too. Memories of how I needed to wrangle what was published by various sources make me appreciate just how vital it is to have harmonised data for analysis work.
Table Production
While many appear to graphical presentation of results to their tabular display, R does have its options here too. In recent times, the options have improved, particularly of the pharmaverse initiative. Here is a selection of what I found during my explorations.
Part of the {officeverse} along with {officedown}, {Flextable}, {Rvg} and {mschart}, the {officer} R package enables users to create and modify Word and PowerPoint documents directly from R, allowing the insertion of images, tables and formatted content, as well as the import of document content into data frames. It supports the generation of RTF files and integrates with other packages for advanced features such as vector graphics and native office charts. Installation options include CRAN and GitHub, with community resources available for assistance and contributions. The package facilitates the manipulation of document elements like paragraphs, tables and section breaks and provides tools for exporting and importing content between R and office formats, alongside functions for managing slide layouts and embedded objects in presentations.
If you work in clinical research like I do, the need to produce data tabulations is a non-negotiable requirement. That is how this package came to be developed and the pharmaverse of which it is part has numerous other options, should you need to look at using one of those. The flavour of RTF produced here is the Microsoft Word variety, which did not look as well in LibreOffice Writer when I last looked at the results with that open-source alternative. Otherwise, the results look well to many eyes.
Here is a package that enhances data presentation by applying customisable formatting to vectors and data frames, supporting formats such as percentages, currency and accounting. Available on GitHub and CRAN, it integrates with dynamic document tools like {knitr} and {rmarkdown} to produce visually distinct tables, with features including gradient colour scales, conditional styling and icon-based representations. It automatically converts to {htmlwidgets} in interactive environments and is licensed under MIT, enabling flexible use in both static and interactive data displays.
The {reactable} package for R provides interactive data tables built on the React Table library, offering features such as sorting, filtering, pagination, grouping with aggregation, virtual scrolling for large datasets and support for custom rendering through R or JavaScript. It integrates seamlessly into R Markdown documents and Shiny applications, enabling the use of HTML widgets and conditional styling. Installation options include CRAN and GitHub, with examples demonstrating its application across various datasets and scenarios. The package supports major web browsers and is licensed under MIT, designed for developers seeking dynamic data presentation tools within the R ecosystem.
Particularly useful in dynamic web applications like Shiny, the {DT} package in R provides a means of rendering interactive HTML tables by building on the DataTables JavaScript library. It supports features including sorting, searching, pagination and advanced filtering, with numeric, date and time columns using range-based sliders whilst factor and character columns rely on search boxes or dropdowns. Filtering operates on the client side by default, though server-side processing is also available. JavaScript callbacks can be injected after initialisation to manipulate table behaviour, such as enabling automatic page navigation or adding child rows to display additional detail. HTML content is escaped by default as a safeguard against cross-site scripting attacks, with the option to adjust this on a per-column basis. Whilst the package integrates with Shiny applications, attention is needed around scrolling and slider positioning to prevent layout problems. Overall, the package is well suited to exploratory data analysis and the building of interactive dashboards.
The {gt} package in R enables users to create well-structured tables with a variety of formatting options, starting from data frames or tibbles and incorporating elements such as headers, footers and customised column labels. It supports output in HTML, LaTeX and RTF formats and includes example datasets for experimentation. The package prioritises simplicity for common tasks while offering advanced functions for detailed customisation, with installation available via CRAN or GitHub. Users can access resources like documentation, community forums and example projects to explore its capabilities, and it is supported by a range of related packages that extend its functionality.
Enabling users to produce publication-ready outputs with minimal code, the {gtsummary} package offers a streamlined approach to generating analytical and summary tables in R. It automates the summarisation of data frames, regression models and other datasets, identifying variable types and calculating relevant statistics, including measures of data incompleteness. Customisation options allow for formatting, merging and styling tables to suit specific needs, while integration with packages such as {broom} and {gt} facilitates seamless incorporation into R Markdown workflows. The package supports the creation of side-by-side regression tables and provides tools for exporting results as images, HTML, Word, or LaTeX files, enhancing flexibility for reporting and sharing findings.
Here is an R package designed to generate LaTeX and HTML tables with a modern, user-friendly interface, offering extensive control over styling, formatting, alignment and layout. It supports features such as custom borders, padding, background colours and cell spanning across rows or columns, with tables modifiable using standard R subsetting or dplyr functions. Examples demonstrate its use for creating simple tables, applying conditional formatting and producing regression output with statistical details. The package also facilitates quick export to formats like PDF, DOCX, HTML and XLSX. Installation options include CRAN, R-Universe and GitHub, while the name reflects its origins as an enhanced version of the {xtable} package. The logo was generated using the package itself, and the background design draws inspiration from Piet Mondrian’s artwork.
Figure Generation
R has such a reputation for graphical presentations that it is cited as a strong reason to explore what the ecosystem has to offer. While base R itself is not shabby when it comes to creating graphs and charts, these packages will extend things by quite a way. In fact, the first on this list is near enough pervasive.
Though its default formatting does not appeal to me, the myriad of options makes this a very flexible tool, albeit at the expense of some code verbosity. Multi-panel plots are not among its strengths, which may send you elsewhere for that need.
Focusing on features not included in the core library, the {ggforce} package extends {ggplot2} by offering additional tools to enhance data visualisation. Designed to complement the primary role of {ggplot2} in exploratory data analysis, it provides a range of geoms, stats and other components that are well-documented and implemented, aiming to support more complex and custom plot compositions. Available for installation via CRAN or GitHub, the package includes a variety of functionalities described in detail on its associated website, though specific examples are not included here.
Developed by Claus O. Wilke for internal use in his lab, {cowplot} is an R package designed to help with the creation of publication-quality figures built on top of {ggplot2}. It provides a set of themes, tools for aligning and arranging plots into compound figures and functions for annotating plots or combining them with images. The package can be installed directly from CRAN or as a development version via GitHub, and it has seen widespread use in the book Fundamentals of Data Visualisation.
The {sjPlot} package provides a range of tools for visualising data and statistical results commonly used in social science research, including frequency tables, histograms, box plots, regression models, mixed effects models, PCA, correlation matrices and cluster analyses. It supports installation via CRAN for stable releases or through GitHub for development versions, with documentation and examples available online. The package is licensed under GPL-3 and developed by Daniel Lüdecke, offering functions to create visualisations such as scatter plots, Likert scales and interaction effect plots, along with tools for constructing index variables and presenting statistical outputs in tabular formats.
By offering a centralised approach to theming and enabling automatic adaptation of plot styles within Shiny applications, the {thematic} package simplifies the styling of R graphics, including {ggplot2}, {lattice} and base R plots, R Markdown documents and RStudio. It allows users to apply consistent visual themes across different plotting systems, with auto-theming in Shiny and R Markdown relying on CSS and {bslib} themes, respectively. Installation requires specific versions of dependent packages such as {shiny} and {rmarkdown}, while custom fonts benefit from {showtext} or {ragg}. Users can set global defaults for background, foreground and accent colours, as well as fonts, which can be overridden with plot-specific theme adjustments. The package also defines default colour scales for qualitative and sequential data and integrates with tools like bslib to import Google Fonts, enhancing visual consistency across different environments and user interfaces.
Publishing Tools
The R ecosystem goes beyond mere graphical and tabular display production to offer means for taking things much further, often offering platforms for publishing your work. These can be used locally too, so there is no need to entrust everything to a third-party provider. The uses are endless for what is available, and it appears that Posit has used this to help with building documentation and training too.
What you have here is one of those distinguishing facilities of the R ecosystem, particularly for those wanting to share their analysis work with more than a hint of reproducibility. The tool combines narrative text and code to generate various outputs, supporting multiple programming languages and formats such as HTML, PDF and dashboards. It enables users to produce reports, presentations and interactive applications, with options for publishing and scheduling through platforms like RStudio Connect, facilitating collaboration and distribution of results in professional settings.
Distill for R Markdown is a tool designed to streamline the creation of technical documents, offering features such as code folding, syntax highlighting and theming. It builds on existing frameworks like Pandoc, MathJax and D3, enabling the production of dynamic, interactive content. Users can customise the appearance with CSS and incorporate appendices for supplementary information. The tool acknowledges the contributions of developers who created foundational libraries, ensuring accessibility and functionality for a wide audience. Its design prioritises clarity, allowing authors to focus on presenting results rather than underlying code, while maintaining flexibility for those who wish to include detailed explanations.
For a while, this was one of R's unique selling points, and remains as compelling a reason to use the language even when Python has got its own version of the package. Enabling the creation of interactive web applications for data analysis without requiring web development expertise allows users to build interfaces that let others explore data through dynamic visualisations and filters. Here is a simple example: an app that generates scatter plots with adjustable variables, species filters and marginal plots, hosted either on personal servers or through a dedicated hosting service.
The {bslib} R package offers a modern user interface toolkit for Shiny and R Markdown applications, leveraging Bootstrap to enable the creation of customisable dashboards and interactive theming. It supports the use of updated Bootstrap and Bootswatch versions while maintaining compatibility with existing defaults, and provides tools for real-time visual adjustments. Installation is available through CRAN, with example previews demonstrating its capabilities.
Enabling users to manipulate and validate data within a spreadsheet-like interface, the {rhandsontable} package introduces an interactive data grid for R. It supports features such as custom cell rendering, validation rules and integration with Shiny applications. When used in Shiny, the widget requires explicit conversion of data using the hot_to_r function, as updates may not be immediately reflected in reactive contexts. Examples demonstrate its application in various scenarios, including date editing, financial calculations and dynamic visualisations linked to charts. The package also accommodates bookmarks in Shiny apps with specific handling. Users are encouraged to report issues or contribute improvements, with guidance provided for those seeking to expand its functionality. The development team welcomes feedback to refine the tool further, ensuring it aligns with evolving user needs.
{xaringanExtra} offers a range of enhancements and extensions for creating and presenting slides with xaringan, enabling features such as adding an overview tile view, making slides editable, broadcasting in real time, incorporating animations, embedding live video feeds and applying custom styles. It allows users to selectively activate individual tools or load multiple features simultaneously through a single function call, supporting tasks like adding banners, enabling code copying, fitting slides to screen dimensions and integrating utility toolkits. The package is available for installation via CRAN or GitHub, providing flexibility for developers and presenters seeking to expand the functionality of their slides.
Online R programming books that are worth bookmarking
As part of making content more useful following its reorganisation, numerous articles on the R statistical computing language have appeared on here. All of those have taken a more narrative form. With this collation of online books on the R language, I take a different approach. What you find below is a collection of links with associated descriptions. While narrative accounts can be very useful, there is something handy about running one's eye down a compilation as well. Many entries have a corresponding print edition, some of which are not cheap to buy, which makes me wonder about the economics of posting the content online as well, though it can help with getting feedback during book preparation.
We start with this comprehensive collection of over 400 free and affordable resources related to the R programming language, organised into categories such as data science, statistics, machine learning and specific fields like economics and life sciences. In many ways, it is a superset of what you find below and complements this collection with many other finds. The fact that it is a living collection makes it even more useful.
R Programming for Data Science
Here is an introduction to the R programming language, focusing on its application in data science. It covers foundational topics such as installation, data manipulation, function writing, debugging and code optimisation, alongside advanced concepts like parallel computation and data analysis case studies. The text includes practical guidance on handling data structures, using packages such as {dplyr} and {readr} as well as working with dates, times and regular expressions. Additional sections address control structures, scoping rules and profiling techniques, while the author also discusses resources for staying updated through a podcast and accessing e-book versions for ongoing revisions.
Designed for individuals with no prior coding experience, the book provides an introduction to programming in R while using practical examples to teach fundamental concepts such as data manipulation, function creation and the use of R's environment system. It is structured around hands-on projects, including simulations of weighted dice, playing cards and a slot machine, alongside explanations of core programming principles like objects, notation, loops and performance optimisation. Additional sections cover installation, package management, data handling and debugging techniques. While the book is written using RMarkdown and published under a Creative Commons licence, a physical edition is available through O’Reilly.
What you have here is one of several books written by Hadley Wickham. This one is published in its second edition as part of Chapman and Hall's R Series and is aimed primarily at R users who want to deepen their programming skills and understanding of the language, though it is also useful for programmers migrating from other languages. The book covers a broad range of topics organised into sections on foundations, functional programming, object-oriented programming, metaprogramming and techniques, with the latter including debugging, performance measurement and rewriting R code in C++.
Unlike Paul Teetor's separately published R Cookbook, the Cookbook for R was created by Winston Chang. It offers solutions to common tasks and problems in data analysis, covering topics such as basic operations, numbers, strings, formulas, data input and output, data manipulation, statistical analysis, graphs, scripts and functions, and tools for experiments.
The second edition of R for Data Science by Hadley Wickham, Mine Çetinkaya-Rundel and Garrett Grolemund offers a structured approach to learning data science with R, covering essential skills such as data visualisation, transformation, import, programming and communication. Organised into chapters that explore workflows, data manipulation techniques and tools like Quarto for reproducible research, the book emphasises practical applications and best practices for handling data effectively.
The R Graphics Cookbook, 2nd edition, offers a comprehensive guide to creating visualisations in R, structured into chapters that cover foundational skills such as installing and using packages, loading data from various formats and exploring datasets through basic plots. It progresses to detailed techniques for constructing bar graphs, line graphs, scatter plots and histograms, alongside methods for customising axes, annotations, themes and legends.
The book also addresses advanced topics like colour application, faceting data into subplots, generating specialised graphs such as network diagrams and heat maps and preparing data for visualisation through reshaping and summarising. Additional sections focus on refining graphical outputs for presentation, including exporting to different file formats and adjusting visual elements for clarity and aesthetics, while an appendix provides an overview of the {ggplot2} system.
R Markdown: The Definitive Guide
Published by Chapman & Hall/CRC, R Markdown: The Definitive Guide by Yihui Xie, J.J. Allaire and Garrett Grolemund covers the R Markdown document format, which has been in use since 2012 and is built on the knitr and Pandoc tools. The format allows users to embed code within Markdown documents and compile the results into a range of output formats including PDF, HTML and Word. The guide covers a broad scope of practical applications, from creating presentations, dashboards, journal articles and books to building interactive applications and generating blogs, reflecting how the ecosystem has matured since the {rmarkdown} package was first released in 2014.
A key principle running throughout is that Markdown's deliberately limited feature set is a strength rather than a drawback, encouraging authors to focus on content rather than complex typesetting. Despite this simplicity, the format remains highly customisable through tools such as Pandoc templates, LaTeX and CSS. Documents produced in R Markdown are also notably portable, as their straightforward syntax makes conversion between output formats more reliable, and because results are generated dynamically from code rather than entered manually, they are far more reproducible than those produced through conventional copy-and-paste methods.
The R Markdown Cookbook is a practical guide designed to help users enhance their ability to create dynamic documents by combining analysis and reporting. It covers essential topics such as installation, document structure, formatting options and output formats like LaTeX, HTML and Word, while also addressing advanced features such as customisations, chunk options and integration with other programming languages. The book provides step-by-step solutions to common tasks, drawing on examples from online resources and community discussions to offer clear, actionable advice for both new and experienced users seeking to improve their workflow and explore the full potential of R Markdown.
This book provides a practical guide to using R Markdown for scientists, developed from a three-hour workshop and designed to evolve as a living resource. It covers essential topics such as setting up R Markdown documents, integrating with RStudio for efficient workflows, exporting outputs to formats like PDF, HTML and Word, managing figures and tables with dynamic references and captions, incorporating mathematical equations, handling bibliographies with citations and style adjustments, troubleshooting common issues and exploring advanced R Markdown extensions.
bookdown: Authoring Books and Technical Documents with R Markdown
Here is a guide to using the {bookdown} package, which extends R Markdown to facilitate the creation of books and technical documents. It covers Markdown syntax, integration of R code, formatting options for HTML, LaTeX and e-book outputs and features such as cross-referencing, custom blocks and theming. The package supports both multipage and single-document outputs, and its applications extend beyond traditional books to include course materials, manuals and other structured content. The work includes practical examples, publishing workflows and details on customisation, alongside information about licensing and the availability of a printed version.
[blogdown]: Creating Websites with R Markdown
Though the authors note that some information may be outdated due to recent updates to Hugo and the {blogdown} package, and they direct readers to additional resources for the latest features and changes, this book still provides a guide to building static websites using R Markdown and the Hugo static site generator, emphasising the advantages of this approach for creating reproducible, portable content. It covers installation, configuration, deployment options such as Netlify and GitHub Pages, migration from platforms like WordPress and advanced topics including custom layouts and version control as well as practical examples, workflow recommendations and discussions on themes, content management and technical aspects of website development.
[pagedown]: Create Paged HTML Documents for Printing from R Markdown
The R package {pagedown} enables users to create paged HTML documents suitable for printing to PDF, using R Markdown combined with a JavaScript library called paged.js, that later of which implements W3C specifications for paged media. While tools like LaTeX and Microsoft Word have traditionally dominated PDF production, pagedown offers an alternative approach through HTML and CSS, supporting a range of document types including resumes, posters, business cards, letters, theses and journal articles.
Documents can be converted to PDF via Google Chrome, Microsoft Edge or Chromium, either manually or through the chrome_print() function, with additional support for server-based, CI/CD pipeline and Docker-based workflows. The package provides customisable CSS stylesheets, a CSS overriding mechanism for adjusting fonts and page properties, and various formatting features such as lists of tables and figures, abbreviations, footnotes, line numbering, page references, cover images, running headers, chapter prefixes and page breaks. Previewing paged documents requires a local or remote web server, and the layout is sensitive to browser zoom levels, with 100% zoom recommended for the most accurate output.
Dynamic Documents with R and knitr
Developed by Yihui Xie and inspired by the earlier {Sweave} package, {knitr} is an R package designed for dynamic report generation that consolidates the functionality of numerous other add-on packages into a single, cohesive tool. It supports multiple input languages, including R, Python and shell scripts, as well as multiple output markup languages such as LaTeX, HTML, Markdown, AsciiDoc and reStructuredText. The package operates on a principle of transparency, giving users full control over how input and output are handled, and runs R code in a manner consistent with how it would behave in a standard R terminal.
Among its notable features are built-in caching, automatic code formatting via the {formatR} package, support for more than 20 graphics devices and flexible options for managing plots within documents. It also allows advanced users to define custom hooks and regular expressions to extend and tailor its behaviour further. The package is affiliated with the Foundation for Open Access Statistics, a nonprofit organisation promoting free software, open access publishing and reproducible research in statistics.
Mastering Shiny is a comprehensive guide to developing web applications using R, focusing on the Shiny framework designed for data scientists. It introduces core concepts such as user interface design, reactive programming and dynamic content generation, while also exploring advanced topics like performance optimisation, security and modular app development. The book covers practical applications across industries, from academic teaching tools to real-time analytics dashboards, and aims to equip readers with the skills to build scalable, maintainable applications. It includes detailed chapters on workflow, layout, visualisation and user interaction, alongside case studies and technical best practices.
Engineering Production-Grade Shiny Apps
This is aimed at developers and team managers who already possess a working knowledge of the Shiny framework for R and wish to advance beyond the basics toward building robust, production-ready applications. Rather than covering introductory Shiny concepts or post-deployment concerns, the book focuses on the intermediate ground between those two stages, addressing project management, workflow, code structure and optimisation.
It introduces the {golem} package as a central framework and guides readers through a five-step workflow covering design, prototyping, building, strengthening and deployment, with additional chapters on optimisation techniques including R code performance, JavaScript integration and CSS. The book is structured to serve both those with project management responsibilities and those focused on technical development, acknowledging that in many small teams these roles are carried out by the same individual.
Outstanding User Interfaces with Shiny
Written by David Granjon and published in 2022, Outstanding User Interfaces with Shiny is a book aimed at filling the gap between beginner and advanced Shiny developers, covering how to deeply customise and enhance Shiny applications to the point where they become indistinguishable from classic web applications. The book spans a wide range of topics, including working with HTML and CSS, integrating JavaScript, building Bootstrap dashboard templates, mobile development and the use of React, providing a comprehensive resource that consolidates knowledge and experience previously scattered across the Shiny developer community.
Now in its second edition, R Packages by Hadley Wickham and Jennifer Bryan is a freely available online guide that teaches readers how to develop packages in R. A package is the core unit of shareable and reproducible R code, typically comprising reusable functions, documentation explaining how to use them and sample data. The book guides readers through the entire process of package development, covering areas such as package structure, metadata, dependencies, testing, documentation and distribution, including how to release a package to CRAN. The authors encourage a gradual approach, noting that an imperfect first version is perfectly acceptable provided each subsequent version improves on the last.
Written by Javier Luraschi, Kevin Kuo and Edgar Ruiz, Mastering Spark with R is a comprehensive guide designed to take readers from little or no familiarity with Apache Spark or R through to proficiency in large-scale data science. The book covers a broad range of topics, including data analysis, modelling, pipelines, cluster management, connections, data handling, performance tuning, extensions, distributed computing, streaming and contributing to the Spark ecosystem.
Happy Git and GitHub for the useR
Here is a practical guide written by Jenny Bryan and contributors, aimed primarily at R users involved in data analysis or package development. It covers the installation and configuration of Git alongside GitHub, the development of key workflows for common tasks and the integration of these tools into day-to-day work with R and R Markdown. The guide is structured to take readers from initial setup through to more advanced daily workflows, with particular attention paid to how Git and GitHub serve the needs of data science rather than pure software development.
Written by John Coene and intended for release as part of the CRC Press R series, JavaScript for R explore how the R programming language and JavaScript can be used together to enhance data science workflows. Rather than teaching JavaScript as a standalone language, the book demonstrates how a limited working knowledge of it can meaningfully extend what R developers can achieve, particularly through the integration of external JavaScript libraries.
The book covers a broad range of topics, progressing from foundational concepts through to data visualisation using the {htmlwidgets} package, bidirectional communication with Shiny, JavaScript-powered computations via the V8 engine and Node.js and the use of modern JavaScript tools such as Vue, React and webpack alongside R. Practical examples are woven throughout, including the building of interactive visualisations, custom Shiny inputs and outputs, image classification and machine learning operations, with all accompanying code made publicly available on GitHub.
This guide addresses challenges faced by developers of R packages that interact with web resources, offering strategies to create reliable unit tests despite dependencies on internet connectivity, authentication and external service availability. It explores tools such as {vcr}, {webmockr}, {httptest} and {webfakes}, which enable mocking and recording HTTP requests to ensure consistent testing environments, reduce reliance on live data and improve test reliability. The text also covers advanced topics like handling errors, securing tests and ensuring compatibility with CRAN and Bioconductor, while emphasising best practices for maintaining test robustness and contributor-friendly workflows. Funded by rOpenSci and the R Consortium, the resource aims to support developers in building more resilient and maintainable R packages through structured testing approaches.
The Shiny AWS Book is an online resource designed to teach data scientists how to deploy, host and maintain Shiny web applications using cloud infrastructure. Addressing a common gap in data science education, it guides readers through a range of DevOps technologies including AWS, Docker, Git, NGINX and open-source Shiny Server, covering everything from server setup and cost management to networking, security and custom configuration.
{ggplot2}: Elegant Graphics for Data Analysis
The third edition of {ggplot2}: Elegant Graphics for Data Analysis provides an in-depth exploration of the Grammar of Graphics framework, focusing on the theoretical foundations and detailed implementation of the ggplot2 package rather than offering step-by-step instructions for specific visualisations. Written by Hadley Wickham, Danielle Navarro and Thomas Lin Pedersen, the book is presented as an online work-in-progress, with content structured across sections such as layers, scales, coordinate systems and advanced programming topics. It aims to equip readers with the knowledge to customise plots according to their needs, rather than serving as a direct guide for creating predefined graphics.
YaRrr! The Pirate’s Guide to R
Written by Nathaniel D. Phillips, this is a beginner-oriented guide to learning the R programming language from the ground up, covering everything from installation and basic navigation of the RStudio environment through to more advanced topics such as data manipulation, statistical analysis and custom function writing. The guide progresses logically through foundational concepts including scalars, vectors, matrices and dataframes before moving into practical areas such as hypothesis testing, regression, ANOVA and Bayesian statistics. Visualisation is given considerable attention across dedicated chapters on plotting, while later sections address loops, debugging and managing data from a variety of file formats. Each chapter includes practical exercises to reinforce learning, and the book concludes with a solutions section for reference.
Data Visualisation: A Practical Introduction
Data Visualisation: A Practical Introduction is a forthcoming second edition from Princeton University Press, written by Kieran Healy and due for release in March 2026, which teaches readers how to explore, understand and present data using the R programming language and the {ggplot2} library. The book aims to bridge the gap between works that discuss visualisation principles without teaching the underlying tools and those that provide code recipes without explaining the reasoning behind them, instead combining both practical instruction and conceptual grounding.
Revised and updated throughout to reflect developments in R and {ggplot2}, the second edition places greater emphasis on data wrangling, introduces updated and new datasets, and substantially rewrites several chapters, particularly those covering statistical models and map-drawing. Readers are guided through building plots progressively, from basic scatter plots to complex layered graphics, with the expectation that by the end they will be able to reproduce nearly every figure in the book and understand the principles that inform each choice.
The book also addresses the growing role of large language models in coding workflows, arguing that genuine understanding of what one is doing remains essential regardless of the tools available. It is suitable for complete beginners, those with some prior R experience, and instructors looking for a course companion, and requires the installation of R, RStudio and a number of supporting packages before work can begin.
Learning R for Data Analysis: Going from the basics to professional practice
R has grown from a specialist statistical language into one of the most widely recognised tools for working with data. Across tutorials, community sites, training platforms and industry resources, it is presented as both a programming language and a software environment for statistical computing, graphics and reporting. It was created by Ross Ihaka and Robert Gentleman at the University of Auckland in New Zealand, and its name draws on the first letter of their first names while also alluding to the Bell Labs language S. It is freely available under the GNU General Public Licence and runs on Linux, Windows and macOS, which has helped it spread across research, education and industry alike.
What Makes R Distinctive
What makes R notable is its combination of programming features with a strong focus on data analysis. Introductory material, such as the tutorials at Tutorialspoint and Datamentor, repeatedly highlights its support for conditionals, loops, user-defined recursive functions and input and output, but these sit alongside effective data handling, a broad set of operators for arrays, lists, vectors and matrices and strong graphical capabilities. That mixture means R can be used for straightforward scripts and for complex analytical workflows. A beginner may start by printing "Hello, World!" with the print() function, while a more experienced user may move on to regression models, interactive dashboards or automated reporting.
The Learning Progression
Learning materials generally present R in a structured progression. A beginner is first introduced to reserved words, variables and constants, operators and the order in which expressions are evaluated. From there, the path usually moves into flow control through if…else, ifelse(), for, while, repeat and the use of break and next, before functions follow naturally, including return values, environments and scope, recursive functions, infix operators and switch(). Most sources agree that confidence with the syntax and fundamentals is the real starting point, and this early sequence matters because it helps learners become comfortable reading and writing R rather than only copying examples.
After the basics, attention tends to turn to the structures that make R so useful for data work. Vectors, matrices, lists, data frames and factors appear in nearly every introductory course because they are central to how information is stored and manipulated. Object-oriented concepts also emerge quite early in some routes through the language, with classes and objects extending into S3, S4 and reference classes. For someone coming from spreadsheets or point-and-click statistical software, this shift can feel significant, but it also opens the way to more reproducible and flexible analysis.
Visualisation
Visualisation is another recurring theme in R education. Basic chart types such as bar plots, histograms, pie charts, box plots and strip charts are common early examples because they show how quickly data can be turned into graphics. More advanced lessons widen the scope through plot functions, multiple plots, saving graphics, colour selection and the production of 3D plots.
Beyond base plotting, there is extensive evidence of the central role of {ggplot2} in contemporary R practice. Data Cornering demonstrates this well, with articles covering how to create funnel charts in R using {ggplot2} and how to diversify stacked column chart data label colours, showing how R is used not only to summarise data but also to tell visual stories more clearly. In the pharmaceutical and clinical research space, the PSI VIS-SIG blog is published by the PSI Visualisation Special Interest Group and summarises its monthly Wonderful Wednesday webinars, presenting real-world datasets and community-contributed chart improvements alongside news from the group.
Data Wrangling and the Tidyverse
Much of modern R work is built around data wrangling, and here the {tidyverse} has become especially prominent. Claudia A. Engel's openly published guide Data Wrangling with R (last updated 3rd November 2023) sets out a preparation phase that assumes some basic R knowledge, a recent installation of R and RStudio and the installation of the {tidyverse} package with install.packages("tidyverse") followed by library(tidyverse). It also recommends creating a dedicated RStudio project and downloading CSV files into a data subdirectory, reinforcing the importance of organised project structure.
That same guide then moves through data manipulation with {dplyr}, covering selecting columns and filtering rows, pipes, adding new columns, split-apply-combine, tallying and joining two tables, before moving on to {tidyr} topics such as long and wide table formats, pivot_wider, pivot_longer and exporting data. These topics reflect a broader pattern in the R ecosystem because data import and export, reshaping, combining tables and counting by group recur across teaching resources as they mirror common analytical tasks.
Applications and Professional Use
The range of applications attached to R is wide, though data science remains the clearest centre of gravity. Educational sources describe R as valuable for data wrangling, visualisation and analysis, often pointing to packages such as {dplyr}, {tidyr}, {ggplot2} and {Shiny}. Statistical modelling is another major strand, with R offering extensible techniques for descriptive and inferential statistics, regression analysis, time series methods and classical tests. Machine learning appears as a further area of growth, supported by a large and expanding package ecosystem. In more advanced contexts, R is also linked with dashboards, web applications, report generation and publishing systems such as Quarto and R Markdown.
R's place in professional settings is underscored by the breadth of organisations and sectors associated with it. Introductory resources mention companies such as Google, Microsoft, Facebook, ANZ Bank, Ford and The New York Times as examples of organisations using R for modelling, forecasting, analysis and visualisation. The NHS-R Community promotes the use of R and open analytics in health and care, building a community of practice for data analysis and data science using open-source software in the NHS and wider UK health and care system. Its resources include reports, blogs, webinars and workshops, books, videos and R packages, with webinar materials archived in a publicly accessible GitHub repository. The R Validation Hub, supported through the pharmaR initiative, is a collaboration to support the adoption of R within a biopharmaceutical regulatory setting and provides tools including the {riskmetric} package, the {riskassessment} app and the {riskscore} package for assessing package quality and risk.
The Wider Ecosystem
The wider ecosystem around R is unusually rich. The R Consortium promotes the growth and development of the R language and its ecosystem by supporting technical and social infrastructure, fostering community engagement and driving industry adoption. It notes that the R language supports over two million users and has been adopted in industries including biotech, finance, research and high technology. Community growth is visible not only through organisations and conferences but through user groups, scholarships, project working groups and local meetups, which matters because learning a language is easier when there is an active support network around it.
Another sign of maturity is the depth of R's package and publication landscape. rdrr.io provides a comprehensive index of over 29,000 CRAN packages alongside more than 2,100 Bioconductor packages, over 2,200 R-Forge packages and more than 76,000 GitHub packages, making it possible to search for packages, functions, documentation and source code in one place. Rdocumentation, powered by DataCamp, covers 32,130 packages across CRAN and Bioconductor and offers a searchable interface for function-level documentation. The Journal of Statistical Software adds a scholarly dimension, publishing open-access articles on statistical computing software together with source code, with full reproducibility mandatory for publication. R-bloggers aggregates R news and tutorials contributed by hundreds of R bloggers, while R Weekly curates a community digest and an accompanying podcast, both helping users keep pace with the steady flow of tutorials, package releases, blog posts and developments across the R world.
Where to Begin
For beginners, one recurring challenge is knowing where to start, and different learning routes reflect different backgrounds. Datamentor points learners towards step-by-step tutorials covering popular topics such as R operators, if...else statements, data frames, lists and histograms, progressing through to more advanced material. R for the Rest of Us offers a staged path through three core courses, Getting Started With R, Fundamentals of R and Going Deeper with R, and extends into nine topics courses covering Git and GitHub, making beautiful tables, mapping, graphics, data cleaning, inferential statistics, package development, reproducibility and interactive dashboards with {Shiny}. The site is explicitly designed for people who may never have coded before and also offers the structured R in 3 Months programme alongside training and consulting. RStudio Education (now part of Posit) outlines six distinct ways to begin learning R, covering installation, a free introductory webinar on tidy statistics, the book R for Data Science, browser-based primers, and further options suited to different learning styles, along with guidance on R Markdown and good project practices.
Despite the variety, the underlying advice is consistent: start by learning the basics well enough to read and write simple code, practise regularly beginning with straightforward exercises and gradually take on more complex tasks, then build projects that matter to you because projects create context and make concepts stick. There is no suggestion that mastery comes from passively reading documentation alone, as practical engagement is treated as essential throughout. The blog Stats and R exemplifies this philosophy well, with the stated aim of making statistics accessible to everyone by sharing, explaining and illustrating statistical concepts and, where appropriate, applying them in R.
That practical engagement can take many forms. Someone interested in data journalism may focus on visualisation and reproducible reporting, while a researcher may prioritise statistical modelling and publishing workflows, and a health analyst may use R for quality assurance, open health data and clinical reporting. Others may work with {Shiny}, package development, machine learning, Git and GitHub or interactive dashboards. The variety shows that R is not confined to a single use case, even if statistics and data science remain the common thread.
Free Learning Resources for R
It is also worth noting that R learning is supported by a great deal of freely available material. Statistics Globe, founded in 2017 by Joachim Schork and now an education and consulting platform, offers more than 3,000 free tutorials and over 1,000 video tutorials on YouTube, spanning R programming, Python and statistical methodology. STHDA (Statistical Tools for High-Throughput Data Analysis) covers basics, data import and export, reshaping, manipulation and visualisation, with material geared towards practical data analysis at every level. Community sites, webinar repositories and newsletters add further layers of accessibility, and even where paid courses exist, the surrounding free ecosystem is substantial.
Taken together, these sources present R as far more than a niche programming language. It is a mature open-source environment with a strong statistical heritage, a practical orientation towards data work and a well-developed community of learners, teachers, developers and organisations. Its core concepts are approachable enough for beginners, yet its package ecosystem and publishing culture support highly specialised and advanced work. For anyone looking to enter data analysis, statistics, visualisation or related areas, R offers a route that begins with simple code and can extend into large-scale analytical workflows.
From summary statistics to published reports with R, LaTeX and TinyTeX
For anyone working across LaTeX, R Markdown and data analysis in R, there comes a point where separate tools begin to converge. Data has to be summarised, those summaries have to be turned into presentable tables and the finished result has to compile into a report that looks appropriate for its audience rather than a console dump. These notes follow that sequence, moving from the practical business of summarising data in R through to tabulation and then on to the publishing infrastructure that makes clean PDF and Word output possible.
Summarising Data with {dplyr}
The starting point for many analyses is a quick exploration of the data at hand. One useful example uses the anorexia dataset from the {MASS} package together with {dplyr}. The dataset contains weight change data for young female anorexia patients, divided into three treatment groups: Cont for the control group, CBT for cognitive behavioural treatment and FT for family treatment.
The basic manipulation starts by loading {MASS} and {dplyr}, then using filter() to create separate subsets for each treatment group. From there, mutate() adds a wtDelta column defined as Postwt - Prewt, giving the weight change for each patient. group_by(Treat) prepares the data for grouped summaries, and arrange(wtDelta) sorts within treatment groups. The notes then show how {dplyr}'s pipe operator, %>%, makes the workflow more readable by chaining these operations. The final summary table uses summarize() to compute the number of observations, the mean weight change and the standard deviation within each treatment group. The reported values are count 29, average weight change 3.006897 and standard deviation 7.308504 for CBT, count 26, average weight change -0.450000 and standard deviation 7.988705 for Cont and count 17, average weight change 7.264706 and standard deviation 7.157421 for FT.
That example is not presented as a complete statistical analysis. Instead, it serves as a quick exploratory route into the data, with the wording remaining appropriately cautious and noting that this is only a glance and not a rigorous analysis.
Choosing an R Package for Descriptive Summaries
The question of how best to summarise data opens up a broader comparison of R packages for descriptive statistics. A useful review sets out a common set of needs: a count of observations, the number and types of fields, transparent handling of missing data and sensible statistics that depend on the data type. Numeric variables call for measures such as mean, median, range and standard deviation, perhaps with percentiles. Categorical variables call for counts of levels and some sense of which categories dominate.
Base R's summary() does some of this reasonably well. It distinguishes categorical from numeric variables and reports distributions or numeric summaries accordingly, while also highlighting missing values. Yet, it does not show an overall record count, lacks standard deviation and is not especially tidy or ready for tools such as kable. Several contributed packages aim to improve on that. Hmisc::describe() gives counts of variables and observations, handles both categorical and numerical data and reports missing values clearly, showing the highest and lowest five values for numeric data instead of a simple range. pastecs::stat.desc() is more focused on numeric variables and provides confidence intervals, standard errors and optional normality tests. psych::describe() includes categorical variables but converts them to numeric codes by default before describing them, which the package documentation itself advises should be interpreted cautiously. psych::describeBy() extends this approach to grouped summaries and can return a matrix form with mat = TRUE.
Among the packages reviewed, {skimr} receives especially strong attention for balancing readability and downstream usefulness. skim() reports record and variable counts clearly, separates variables by type and includes missing data and standard summaries in an accessible layout. It also works with group_by() from {dplyr}, making grouped summaries straightforward to produce. More importantly for analytical workflows, the skim output can be treated as a tidy data frame in which each combination of variable and statistic is represented in long form, meaning the results can be filtered, transformed and plotted with standard tidyverse tools such as {ggplot2}.
{summarytools} is presented as another strong option, though with a distinction between its functions. descr() handles numeric variables and can be converted to a data frame for use with kable, while dfSummary() works across entire data frames and produces an especially polished summary. At the time of the original notes, dfSummary() was considered slow. The package author subsequently traced the issue, as documented in the same review, to an excessive number of histogram breaks being generated for variables with large values, imposing a limit to resolve it. The package also supports output through view(dfSummary(data)), which yields an attractive HTML-style summary.
Grouped Summary Table Packages
Once the data has been summarised, the next step is turning those summaries into formal tables. A detailed comparison covers a number of packages specifically designed for this purpose: {arsenal}, {qwraps2}, {Amisc}, {table1}, {tangram}, {furniture}, {tableone}, {compareGroups} and {Gmisc}. {arsenal} is described as highly functional and flexible, with tableby() able to create grouped tables in only a few lines and then be customised through control objects that specify tests, display statistics, labels and missing value treatment. {qwraps2} offers a lot of flexibility through nested lists of summary specifications, though at the cost of more code. {Amisc} can produce grouped tables and works with pander::pandoc.table(), but is noted as not being on CRAN. {table1} creates attractive tables with minimal code, though its treatment of missing values may not suit every use case. {tangram} produces visually appealing HTML output and allows custom rows such as missing counts to be inserted manually, although only HTML output is supported. {furniture} and {tableone} both support grouped table creation, but {tableone} in particular is notable because it is widely used in biomedical research for baseline characteristics tables.
The {tableone} package deserves separate mention because it is designed to summarise continuous and categorical variables in one table, a common need in medical papers. As the package introduction explains, CreateTableOne() can be used on an entire dataset or on a selected subset of variables, with factorVars specifying variables that are coded numerically but should be treated as categorical. The package can display all levels for categorical variables, report missing values via summary() and switch selected continuous variables to non-normal summaries using medians and interquartile ranges instead of means and standard deviations. For grouped comparisons, it prints p-values by default and can switch to non-parametric tests or Fisher's exact test where needed. Standardised mean differences can also be shown. Output can be captured as a matrix and written to CSV for editing in Excel or Word.
Styling and Exporting Tables
With tables constructed, the focus shifts to how they are presented and exported. As Hao Zhu's conference slides explain, the {kableExtra} package builds on knitr::kable() and provides a grammar-like approach to adding styling layers, importing the pipe %>% symbol from {magrittr} so that formatting functions can be added in the same way that layers are added in {ggplot2}. It supports themes such as kable_paper, kable_classic, kable_minimal and kable_material, as well as options for striping, hover effects, condensed layouts, fixed headers, grouped rows and columns, footnotes, scroll boxes and inline plots.
Table output is often the visible end of an analysis, and a broader review of R table packages covers a range of approaches that go well beyond the default output. In R Markdown, packages such as {gt}, {kableExtra}, {formattable}, {DT}, {reactable}, {reactablefmtr} and {flextable} all offer richer possibilities. Some are aimed mainly at HTML output, others at Word. {DT} in particular supports highly customised interactive tables with searching, filtering and cell styling through more advanced R and HTML code. {flextable} is highlighted as the strongest option when knitting to Word, given that the other packages are primarily designed for HTML.
For users working in Word-heavy settings, older but still practical workflows remain relevant too. One approach is simply to write tables to comma-separated text files and then paste and convert the content into a Word table. Another route is through {arsenal}'s write2 functions, designed as an alternative to SAS ODS. The convenience functions write2word(), write2html() and write2pdf() accept a wide range of objects: tableby, modelsum, freqlist and comparedf from {arsenal} itself, as well as knitr::kable(), xtable::xtable() and pander::pander_return() output. One notable constraint is that {xtable} is incompatible with write2word(). Beyond single tables, the functions accept a list of objects so that multiple tables, headers, paragraphs and even raw HTML or LaTeX can all be combined into a single output document. A yaml() helper adds a YAML header to the output, and a code.chunk() helper embeds executable R code chunks, while the generic write2() function handles formats beyond the three convenience wrappers, such as RTF.
The Publishing Infrastructure: CTAN and Its Mirrors
Producing PDF output from R Markdown depends on a working LaTeX installation, and the backbone of that ecosystem is CTAN, the Comprehensive TeX Archive Network. CTAN is the main archive for TeX and LaTeX packages and is supported by a large collection of mirrors spread around the world. The purpose of this distributed system is straightforward: users are encouraged to fetch files from a site that is close to them in network terms, which reduces load and tends to improve speed.
That global spread is extensive. The CTAN mirror list organises sites alphabetically by continent and then by country, with active sites listed across Africa, Asia, Europe, North America, Oceania and South America. Africa includes mirrors in South Africa and Morocco. Asia has particularly wide coverage, with many mirrors in China as well as sites in Korea, Hong Kong, India, Indonesia, Japan, Singapore, Taiwan, Saudi Arabia and Thailand. Europe is especially rich in mirrors, with hosts in Denmark, Germany, Spain, France, Italy, the Netherlands, Norway, Poland, Portugal, Romania, Switzerland, Finland, Sweden, the United Kingdom, Austria, Greece, Bulgaria and Russia. North America includes Canada, Costa Rica and the United States, while Oceania covers Australia and South America includes Brazil and Chile.
The details matter because different mirrors expose different protocols. While many support HTTPS, some also offer HTTP, FTP or rsync. CTAN provides a mirror multiplexer to make the common case simpler: pointing a browser to https://mirrors.ctan.org/ results in automatic redirection to a mirror in or near the user's country. There is one caveat. The multiplexer always redirects to an HTTPS mirror, so anyone intending to use another protocol needs to select manually from the mirror list. That is why the full listings still include non-HTTPS URLs alongside secure ones.
There is also an operational side to the network that is easy to overlook when things are working well. CTAN monitors mirrors to ensure they are current, and if one falls behind, then mirrors.ctan.org will not redirect users there. Updates to the mirror list can be sent to ctan@ctan.org. The master host of CTAN is ftp.dante.de in Cologne, Germany, with rsync access available at rsync://rsync.dante.ctan.org/CTAN/ and web access on https://ctan.org/. For those who want to contribute infrastructure rather than simply use it, CTAN also invites volunteers to become mirrors.
TinyTeX: A Lightweight LaTeX Distribution
This infrastructure becomes much more tangible when looking at a lightweight TeX distribution such as TinyTeX. TinyTeX is a lightweight, cross-platform, portable and easy-to-maintain LaTeX distribution based on TeX Live. It is small in size but intended to function well in most situations, especially for R users. Its appeal lies in not requiring users to install thousands of packages they will never use, installing them as needed instead. This also means installation can be done without administrator privileges, which removes one of the more familiar barriers around traditional TeX setups. TinyTeX can even be run from a flash drive.
For R users, TinyTeX is closely tied to the {tinytex} R package. The distinction is important: tinytex in lower case refers to the R package, while TinyTeX refers to the LaTeX distribution. Installation is intentionally direct. After installing the R package with install.packages('tinytex'), a user can run tinytex::install_tinytex(). Uninstallation is equally simple with tinytex::uninstall_tinytex(). For the average R Markdown user, that is often enough. Once TinyTeX is in place, PDF compilation usually requires no further manual package management.
There is slightly more to know if the aim is to compile standalone LaTeX documents from R. The {tinytex} package provides wrappers such as pdflatex(), xelatex() and lualatex(). These functions detect required LaTeX packages that are missing and install them automatically by default. In practical terms, that means a small example document can be written to a file and compiled with tinytex::pdflatex('test.tex') without much concern about whether every dependency has already been installed. For R users, this largely removes the old pattern of cryptic missing-package errors followed by manual searching through TeX repositories.
Developers may want more than the basics, and TinyTeX has a path for that as well. A helper such as tinytex:::install_yihui_pkgs() installs a collection of packages needed for building the PDF vignettes of many CRAN packages. That is a specific convenience rather than a universal requirement, but it illustrates the design philosophy behind TinyTeX: keep the initial footprint light and offer ways to add what is commonly needed later.
Using TinyTeX Outside R
For users outside R, TinyTeX still works, but the focus shifts to the command-line utility tlmgr. The documentation is direct in its assumptions: if command-line work is unwelcome, another LaTeX distribution may be a better fit. The central command is tlmgr, and much of TinyTeX maintenance can be expressed through it.
On Linux, installation places TinyTeX in $HOME/.TinyTeX and creates symlinks for executables such as pdflatex under $HOME/bin or $HOME/.local/bin if it exists. The installation script is fetched with wget and piped to sh, after first checking that Perl is correctly installed. On macOS, TinyTeX lives in ~/Library/TinyTeX, and users without write permission to /usr/local/bin may need to change ownership of that directory before installation. Windows users can run a batch file, install-bin-windows.bat, and the default installation directory is %APPDATA%/TinyTeX unless APPDATA contains spaces or non-ASCII characters, in which case %ProgramData% is used instead. PowerShell version 3.0 or higher is required on Windows.
Uninstallation follows the same self-contained logic. On Linux and macOS, tlmgr path remove is followed by deleting the TinyTeX folder. On Windows, tlmgr path remove is followed by removing the installation directory. This simplicity is a deliberate contrast with larger LaTeX distributions, which are considerably more involved to remove cleanly.
Maintenance and Package Management
Maintenance is where TinyTeX's relationship to CTAN and TeX Live becomes especially visible. If a document fails with an error such as File 'times.sty' not found, the fix is to search for the package containing that file with tlmgr search --global --file "/times.sty". In the example given, that identifies the psnfss package, which can then be installed with tlmgr install psnfss. If the package includes executables, tlmgr path add may also be needed. An alternative route is to upload the error log to the yihui/latex-pass GitHub repository, where package searching is carried out remotely.
If the problem is less obvious, a full update cycle is suggested: tlmgr update --self --all, then tlmgr path add and fmtutil-sys --all. R users have wrappers for these tasks too, including tlmgr_search(), tlmgr_install() and tlmgr_update(). Some situations still require a full reinstallation. If TeX Live reports Remote repository newer than local, TinyTeX should be reinstalled manually, which for R users can be done with tinytex::reinstall_tinytex(). Similarly, when a TeX Live release is frozen in preparation for a new one, the advice is simply to wait and then reinstall when the next release is ready.
The motivation behind TinyTeX is laid out with unusual clarity. Traditional LaTeX distributions often present a choice between a small basic installation that soon proves incomplete and a very large full installation containing thousands of packages that will never be used. TinyTeX is framed as a way around those frustrations by building on TeX Live's portability and cross-platform design while stripping away unnecessary size and complexity. The acknowledgements also underline that TinyTeX depends on the work of the TeX Live team.
Connecting the R Workflow to a Finished Report
Taken together, these notes show how closely summarisation, tabulation and publishing are linked. {dplyr} and related tools make it easy to summarise data quickly, while a wide range of R packages then turn those summaries into tables that are not only statistically useful but also presentable. CTAN and its mirrors keep the TeX ecosystem available and current across the world, and TinyTeX builds on that ecosystem to make LaTeX more manageable, especially for R users. What begins with a grouped summary in the console can end with a polished report table in HTML, PDF or Word, and understanding the chain between those stages makes the whole workflow feel considerably less mysterious.
From planning to production: Selected aspects of modern software delivery
Software delivery has never been more interlinked across strategy, planning and operations. Agile practices are adapting to hybrid work, AI is reshaping how teams plan and execute, and cloud platforms have become the default substrate for everything from build pipelines to runtime security. What follows traces a practical route through that terrain, drawing together current guidance, tools and community efforts so teams can make informed choices without having to assemble the big picture for themselves.
Work Management: Asana and Jira
Planning and coordination remain the foundation of any delivery effort, and the market still gravitates to two names for day-to-day project management: Asana and Jira. Each can bring order to multi-team projects and distributed work, yet they approach the job from very different histories.
With a history rooted in large DevOps teams and issue tracking, Jira carries that lineage into its Scrum and Kanban options, backlogs, sprints and a reporting catalogue that leans into metrics such as time in status, resolution percentages and created-versus-resolved trends. Built as a more general project manager from the outset, Asana shows its intent in the way users move from a decluttered home screen to “My Tasks”, switch among Kanban, Gantt and Calendar views using tabs, and add custom fields or rules from within the view rather than navigating to separate screens. The two now look similar at a glance, but their structure and presentation differ, and that influences how quickly a team settles into a rhythm.
Dashboards and Reporting
Those differences widen when examining dashboards and reporting. Jira allows users to create multiple dashboards and fill them with a large range of gadgets, including assigned issues, average time in status, bubble charts, heat maps and slideshows. The designs are sparse yet flexible, and administrators on company-managed accounts can add custom reporting, while the Atlassian Marketplace offers hundreds of additional reporting integrations.
By contrast, the home dashboard in Asana is intentionally pared back, with reports placed in their own section to keep personal task management separate from project or portfolio-level tracking. Its native reporting is broader and more polished out of the box, with pre-built views for progress, work health and resourcing, together with custom report creation that does not require admin-level access.
Interoperability
How well each tool connects to other systems also sets expectations. Jira, as part of Atlassian's suite, has a bustling marketplace with over a thousand apps for its cloud product, covering project management, IT service management, reporting and more. Asana's store is smaller, with under 400 apps at the time of writing, though it continues to grow and offers breadth across staples such as Slack, Teams and Adobe Creative Cloud, as well as a strong showing for IT and developer use cases.
Both tools connect to Zapier, which has also published a detailed comparison of the two platforms, opening pathways to thousands of further automations, such as creating Jira issues from Typeform submissions or making Asana tasks from Airtable records without writing integration code. In practice, many teams will get what they need natively and then extend in targeted ways, whether through marketplace add-ons or workflow automations.
Plans and AI
Plans and AI are where the most significant recent movement has occurred. On the Asana side, a free Personal tier leads into paid Starter and Advanced plans followed by Enterprise, with AI tools (branded "Asana Intelligence") included across paid plans. Those features help prioritise work, automate repetitive steps, suggest smart workflows and summarise discussions to reduce time spent on status communication.
Over at Jira, the structure runs from a free tier for small teams through Standard, Premium and Enterprise plans. "Atlassian Intelligence" focuses on generative support in the issue editor, AI summaries and AI-assisted sprint planning, adding predictive insights to help with resource allocation and automation. It is worth noting that Jira's entry-level paid plan appears cheaper on paper, but real-world total cost of ownership often rises once Marketplace apps, Confluence licences and security add-ons are factored in.
Choosing between the two typically comes down to need. If you want a task manager built for general use with crisp reporting and strong collaboration features, Asana presents itself clearly. If your roadmap lives and breathes Agile sprints, backlogs and issue workflows, and you need deep extensibility across a suite, Jira remains a natural fit.
Scrum: Back to Basics
Method matters as much as tooling. Scrum remains the most widely adopted Agile framework, and it is worth revisiting its essentials when translating plans into delivery. The DevOps Institute tracks the human side of this evolution, noting that skills, learning and collaboration are as central to DevOps success as the toolchain itself. A Scrum Team is cross-functional and self-organising, combining the Product Owner's focus on prioritising a transparent, value-ordered Product Backlog with a Development Team that turns backlog items into a potentially shippable increment every Sprint.
The Scrum Master keeps the framework alive, removes impediments, and coaches both the team and the wider organisation. Sprints run for no longer than four weeks and bundle Sprint Planning, Daily Scrums, a Sprint Review and a Retrospective, with online whiteboards increasingly used to run those ceremonies effectively across distributed and hybrid teams. The Sprint Goal provides a unifying target, and the Sprint Backlog breaks selected Product Backlog items into tasks and steps owned by the team.
Scrum Versus Waterfall
That cadence stands in deliberate contrast to classic waterfall approaches, where specification, design, implementation, testing and deployment proceed in long phases with significant hand-offs between each. Scrum replaces upfront specifications with user stories and collaborative refinement using the "three Cs" of Card, Conversation and Confirmation, so requirements can evolve alongside market needs. It places self-organisation ahead of management directives in deciding how work is done within a Sprint, and it raises transparency by making progress and problems visible every day rather than at phase gates.
Teams feel the shift when they commit to delivering a working increment each Sprint rather than aiming for a distant release, and when they see the cost of change flatten because feedback arrives through Reviews and Retrospectives rather than months after decisions have been made.
The State of Agile
Richer context for these shifts appears in longitudinal views of industry practice. The 18th State of Agile Report, published by Digital.ai in late 2025, observes that Agile is adapting rather than fading, with adoption remaining widespread while many organisations rebuild from the ground up to focus on measurable outcomes. The report, drawing on responses from approximately 350 practitioners, notes that AI and automation are accelerating change while introducing fresh expectations around data quality, decision-making and governance, and it emphasises that outcomes have become the currency connecting strategy, planning and execution.
That aligns with the Agile Alliance's ongoing work to re-examine Agile's core values for enterprise settings, as well as with the joint Manifesto for Enterprise Agility initiative with PMI{:target="_blank"}, which argues for adaptability as a strategic advantage rather than a team-level method choice. Significantly, the 18th report found that only 13% of respondents say Agile is deeply embedded across their business, and that only 15% of business leaders participate meaningfully in Agile practices, suggesting that leadership alignment remains one of the most persistent blockers to realising the framework's full potential.
Continuous Delivery and CI/CD Tooling
Getting from plan to production relies on engineering foundations that have matured alongside Agile. Continuous Delivery reframes deployment as a safe, rapid and sustainable capability by keeping code in a deployable state and eliminating the traditional post-"dev complete" phases of integration, testing and hardening. By building deployment pipelines that automate build, environment provisioning and regression testing, teams reduce risk, shorten lead time and can redirect human effort towards exploratory, usability, performance and security testing throughout delivery, not just at the end.
The results can be counterintuitive. High-performing teams deploy more frequently and more reliably, even in regulated settings because painful activities are made routine and small batches make feedback economical.
CI/CD in Practice
Contemporary CI/CD tools express that philosophy in developer-centred ways. Travis CI can often be described in minutes using minimal YAML configuration, specifying runtimes, caching dependencies, parallelising jobs and running tests across multiple language versions. Azure Pipelines, GitHub Actions and Azure DevOps provide similar capabilities at broader scale, with managed runners, gated releases, integrated artefact feeds, security scanning and policy controls that matter in larger enterprises.
The emphasis across these platforms is on speed to first pipeline, consistency across environments and adding guardrails such as signed artefacts, scoped credentials and secret management, so that velocity does not undercut safety.
Cloud Native Architecture
Architecture and platform choices amplify or constrain delivery flow. The cloud native ecosystem, curated by the Cloud Native Computing Foundation (CNCF) under the Linux Foundation, has become the common bedrock for organisations standardising on Kubernetes, service meshes and observability stacks. Hosting more than 200 projects across sandbox, incubating and graduated maturity levels, it spans everything from container orchestration to policy and tracing, and brings together vendors, end users and maintainers at events such as KubeCon + CloudNativeCon.
Sitting higher up the stack, Knative is a recent CNCF graduate that provides building blocks for HTTP-first, event-driven serverless workloads on Kubernetes. It unifies serving and eventing, so teams can scale to zero on demand while routing asynchronous events with the same fluency as web requests, and was created at Google before joining the CNCF as an incubating project and subsequently reaching graduation status. For teams that need to manage the underlying cluster infrastructure declaratively, Cluster API provides a Kubernetes-native way to provision, upgrade and operate clusters across cloud and on-premises environments, bringing the same declarative model used for application workloads to the infrastructure layer itself.
APIs and Developer Ecosystems
API-driven integration is part of the cloud native picture rather than an afterthought. The API Landscape compiled by Apidays shows the sheer diversity of stakeholders and tools across the programmable economy, from design and testing to gateways, security and orchestration. Developer ecosystems such as Cisco DevNet bring this to ground level by offering documentation, labs, sample code and sandboxes across networking, security and collaboration products, encouraging infrastructure as code with tools like Terraform and Ansible.
Version control and collaboration sit at the centre of modern delivery, and GitHub's documentation, spanning everything from Codespaces to REST and GraphQL APIs, reflects that centrality. The breadth of what is available through a single platform, from repository management to CI/CD workflows and AI-assisted coding, illustrates how much of the delivery stack can now be coordinated in one place.
Security: An End-to-End Discipline
Security threads through every layer and is increasingly treated as an end-to-end discipline rather than a late-stage gate. The Open-Source Security Foundation (OpenSSF) coordinates community efforts to secure open-source software for the public good, spanning working groups on AI and machine learning security, supply chain integrity and vulnerability disclosure, and offering guides, courses and annual reviews.
On the cloud side, a Cloud-Native Application Protection Platform (CNAPP) consolidates capabilities to protect applications across multi-cloud estates. Core components typically include Cloud Infrastructure Entitlement Management (to rein in excessive permissions), Kubernetes Security Posture Management (to maintain container orchestration best practices and flag misconfigurations), Data Security Posture Management (to classify and monitor sensitive data) and Cloud Detection and Response (to automate threat response and connect to security orchestration platforms).
Increasingly, AI-driven Security Posture Management sits across these layers to spot anomalies and predict risks from historical patterns, though this brings its own challenges around false positives and model bias that require careful adoption planning. Vendors such as Check Point offer CNAPP products including CloudGuard with unified management and compliance automation. While such examples illustrate what is available commercially, it is the architecture and functions described above that define the category itself.
Site Reliability Engineering
Reliability is not left to chance in well-run organisations. Site Reliability Engineering (SRE), pioneered and documented by Google, treats operations as a software problem and asks SRE's to protect, provide for and progress the systems that underpin user-facing services. The remit ranges from disk I/O considerations to continental-scale capacity planning, with a constant focus on availability, latency, performance and efficiency.
Error budgets, automation, toil reduction and blameless post-mortems become part of the vocabulary for teams that want to move fast without eroding trust. The approach complements Continuous Delivery by turning operational quality into something measurable and improvable, rather than a set of aspirations.
Code Quality, Testing and Documentation
For all the automation and platform power now available, the basics of code quality and testing still count. The Twelve-Factor App methodology remains relevant in encouraging declarative automation, clean contracts with the operating system, strict separation of build and run stages, stateless processes, externalised configuration, dev-prod parity and treating logs as event streams rather than files to be managed. It was first presented by developers at Heroku and continues to inform how teams design applications for cloud environments.
Documentation practices have also evolved, from literate programming's argument that source should be written as human-readable text with code woven through, to modern API documentation standards that keep codebases easier to change and onboard. General-purpose resources such as the long-running Software QA and Testing FAQ remind teams that verification and validation are distinct activities, that a spectrum of testing types is available and that common delivery problems have known countermeasures when documentation, estimation and test design are taken seriously.
AI in Software Delivery
No survey of modern software delivery can sidestep artificial intelligence. Adoption is now near-universal: the 2025 DORA State of AI-Assisted Software Development report, drawing on responses from almost 5,000 technology professionals worldwide, found that around 90% of developers now use AI as part of their daily work, with the median respondent spending roughly two hours per day interacting with AI tools. More than 80% report feeling more productive as a result. The picture is not straightforward, however. The same research found that AI adoption correlates with higher delivery instability, more change failures and longer cycle times for resolving issues because the acceleration AI brings upstream tends to expose bottlenecks in testing, code review and quality assurance that were previously hidden.
The report's central conclusion is that AI functions as an amplifier rather than a remedy. Strong teams with solid engineering foundations use it to accelerate further, while teams carrying technical debt or process dysfunction find those problems magnified rather than resolved. This means the strategic question is not simply which AI tools to adopt, but whether the underlying platform, workflow and culture are ready to benefit from them. The DORA AI Capabilities Model, published as a companion guide, identifies seven foundational practices that consistently improve AI outcomes, including a clear organisational stance on AI use, healthy data ecosystems, working in small batches and a user-centric focus. Teams without that last ingredient, the report warns, can actually see performance worsen after adopting AI.
At the tooling level, the landscape has moved quickly. Coding assistants such as GitHub Copilot have gone from novelty to standard practice in many engineering organisations, with newer entrants including Cursor, Windsurf and agentic tools like Claude Code pushing the category further. The shift from "copilot" to "agent" is significant: where earlier tools suggested completions as a developer typed, agentic systems accept a goal and execute a multistep plan to reach it, handling scaffolding, test generation, documentation and deployment checks with far less human intervention. That brings real efficiency gains and also new governance questions around traceability, code provenance and the trust that teams place in AI-generated output. Around 30% of DORA respondents reported little or no trust in code produced by AI, a figure that points to where the next wave of tooling and practice will need to focus.
Putting It Together
Translating all of this into practice looks different in every organisation, yet certain patterns recur. Teams choose a work management tool that matches the shape of their portfolio and the degree of Agile structure they need, whether that is Asana's lighter-weight task management with strong reporting or Jira's DevOps-aligned issue and sprint workflows with deep extensibility, then align on a Scrum-like cadence if iteration and feedback are priorities, or adopt hybrid approaches that sustain visibility while staying compatible with regulatory or vendor constraints.
Build, test and release are automated early so that pipelines, not people, become the route to production, and cloud native platforms keep environments reproducible and scalable across teams and geographies. Instrumentation ensures that security posture, reliability and cost are visible and managed continuously rather than episodically, and deliberate investment in engineering foundations, small batches, fast feedback and strong platform quality, creates the conditions that the evidence now shows are prerequisites for AI to deliver on its promise rather than amplify existing dysfunction.
If anything remains uncertain, it is often the sequencing rather than the destination. Few organisations can refit planning tools, delivery pipelines, platform architecture and security models all at once, and there is no definitive order that works everywhere. Starting where friction is highest and then iterating tends to be more durable than a one-shot transformation, and most of the resources cited here assume that change will be continuous rather than staged. Agile communities, cloud native foundations and security collaboratives exist because no single team has all the answers, and that may be the most practical lesson of all.
Security or Control? The debate over Google's Android verification policy
A policy announced by Google in August 2025 has ignited one of the more substantive disputes in mobile technology in recent years. At its surface, the question is about app security. Beneath that, it touches on platform architecture, competition law, the long history of Android's unusual relationship with openness, and the future of independent software distribution. To understand why the debate is so charged, it helps to understand how Android actually works.
An Open Platform With a Proprietary Layer
Android presents a genuinely unusual situation in the technology industry. The base operating system is the Android Open-Source Project (AOSP), which is publicly available and usable by anyone. Manufacturers can take the codebase and build their own systems without involvement from Google, as Amazon has with Fire OS and as projects such as LineageOS and GrapheneOS have demonstrated.
Most commercial Android devices, however, do not run pure AOSP. They ship with a proprietary bundle called Google Mobile Services (GMS), which includes Google Play Store, Google Play Services, Google Maps, YouTube and a range of other applications and developer frameworks. These components are not open source and require a licence from Google. Because most popular applications depend on Play Services for functions such as push notifications, location services, in-app payments and authentication, shipping without them is commercially very difficult. This layered architecture gives Google considerable influence over Android without owning it in the traditional proprietary sense.
Google has further consolidated this influence through a series of technical initiatives. Project Treble separated Android's framework from hardware-specific components to make operating system updates easier to deploy. Project Mainline went further, turning important parts of the operating system, including components responsible for media processing, network security and cryptography, into modules that Google can update directly via the Play Store, bypassing manufacturers and mobile carriers entirely. The result is a platform that is open source in its code, but practically centralised in how it evolves and is maintained.
The Policy and Its Rationale
Against this backdrop, Google announced in August 2025 that it would extend its developer identity verification requirements beyond the Play Store to cover all Android apps, including those distributed through third-party stores and direct sideloading. From September 2026, any app installed on a certified Android device in Brazil, Indonesia, Singapore and Thailand must originate from a developer who has registered their identity with Google. A global rollout is planned from 2027 onwards.
Google's stated rationale is grounded in security evidence. The company's own analysis found over 50 times more malware from internet-sideloaded sources than from apps available through Google Play. In 2025, Google Play Protect blocked 266 million risky installation attempts and helped protect users from 872,000 unique high-risk applications. Google has also documented a specific and recurring attack pattern in Southeast Asia, in which scammers impersonate bank representatives during phone calls, coaching victims into sideloading a fraudulent application that then intercepts two-factor authentication codes to drain bank accounts. The company argues that anonymous developer accounts make this kind of attack far easier to sustain.
The registration process requires developers to create an Android Developer Console account, submit government-issued identification and pay a one-time fee of $25. Organisations must additionally supply a D-U-N-S Number from Dun & Bradstreet. Google has stated explicitly that verified developers will retain full freedom to distribute apps through any channel they choose, and is building an "advanced flow" that would allow experienced users to install unverified apps after working through a series of clear warnings. Developers and power users will also retain the ability to install apps via Android Debug Bridge (ADB). Brazil's banking federation FEBRABAN and Indonesia's Ministry of Communications and Digital Affairs have both publicly welcomed the policy as a proportionate response to documented fraud.
What This Means for F-Droid
F-Droid, founded by Ciaran Gultnieks in 2010, operates as a community-run repository of free and open-source software (FOSS) applications for Android. For 15 years, it has demonstrated that app distribution can be transparent, privacy-respecting and accountable, setting a standard that challenges the mobile ecosystem more broadly. Every application listed on the platform undergoes checks for security vulnerabilities, and apps carrying advertising, user tracking or dependence on non-free software are explicitly flagged with an "Anti-Features" label. The platform requires no user accounts and displays no advertising. It still needs some learning, as I found when adding an app through it for a secure email service.
F-Droid operates through an unusual technical model that is worth understanding in its own right. Rather than distributing APKs produced by individual developers, it builds applications itself from publicly available source code. The resulting APKs are signed with F-Droid's own keys and distributed through the F-Droid client. This approach prioritises supply-chain transparency, since users can in theory verify that a distributed binary corresponds to the published source code. However, it also means that updates can be slower than other distribution channels, and that apps distributed via F-Droid cannot be updated over a Play Store version. Some developers have also noted that subtle differences in build configuration can occasionally cause issues.
The new verification requirement creates a structural problem that F-Droid cannot resolve independently. Many of the developers who contribute to its repository are hobbyists, academics or privacy-conscious individuals with no commercial motive and no desire to submit government identification to a third party as a condition of sharing software. F-Droid cannot compel those developers to register, and taking over their application identifiers on their behalf would directly undermine the open-source authorship model it exists to protect.
F-Droid is not alone in this concern. The policy equally affects alternative distribution models that have emerged alongside it. Tools such as Obtainium allow users to track and install updates directly from developers' GitHub or GitLab release pages, bypassing app stores entirely. The IzzyOnDroid repository provides a curated alternative to F-Droid's main catalogue. Aurora Store allows users to access the Play Store's catalogue without Google account credentials. All of these models, to varying degrees, depend on the ability to distribute software independently of Google's centralised infrastructure.
The Organised Opposition
On the 24th of February 2026, more than 37 organisations signed an open letter addressed to Google's leadership and copied to competition regulators worldwide. Signatories included the Electronic Frontier Foundation, the Free Software Foundation Europe, the Software Freedom Conservancy, Proton AG, Nextcloud, The Tor Project, FastMail and Vivaldi. Their central argument is that the policy extends Google's gatekeeping authority beyond its own marketplace into distribution channels where it has no legitimate operational role, and that it imposes disproportionate burdens on independent developers, researchers and civil society projects that pose no security risk to users.
The Keep Android Open campaign, initiated by Marc Prud'hommeaux, an F-Droid board member and founder of the alternative app store for iOS, App Fair, has been in contact with regulators in the United States, Brazil and Europe. F-Droid's legal infrastructure has been strengthened in recent years in anticipation of challenges of this kind. The project operates under the legal umbrella of The Commons Conservancy, a nonprofit foundation based in the Netherlands, which provides a clearly defined jurisdiction and a framework for legal compliance.
The Genuine Tension
Both positions have merit, and the debate is not easily resolved. The malware problem Google describes is real. Social engineering attacks of the kind documented in Southeast Asia cause genuine financial harm to ordinary users, and the anonymity afforded by unverified sideloading makes it considerably easier for bad actors to operate at scale and reoffend after being removed. The introduction of similar requirements on the Play Store in 2023 appears to have had some measurable effect on reducing fraudulent developer accounts.
At the same time, critics are right to question whether the policy is proportionate to the problem it is addressing. The people most harmed by anonymous sideloading fraud are not, in the main, the people who use F-Droid. FOSS users tend to be technically experienced, privacy-aware and deliberate in their choices. The open letter from Keep Android Open also notes that Android already provides multiple security mechanisms that do not require central registration, including Play Protect scanning, permission systems and the existing installation warning framework. The argument that these existing mechanisms are insufficient to address sophisticated social engineering, where users are coached to bypass warnings, has some force. The argument that they are insufficient to address independent FOSS distribution is harder to sustain.
There is a further tension between Google's security claims and its competitive interests. Requiring all app developers to register with Google strengthens Google's position as the de facto authority over the Android ecosystem, regardless of whether a developer uses the Play Store. That outcome may be an incidental consequence of a genuine security initiative, or it may reflect a deliberate consolidation of control. The open letter's signatories argue the former cannot be assumed, particularly given that Google faces separate antitrust investigations in multiple jurisdictions.
The Antitrust Dimension
The policy sits in a legally sensitive area. Android holds approximately 72.77 per cent of the global mobile operating system market as of late 2025, running on roughly 3.9 billion active devices. Platforms with that scale of market presence attract a different level of regulatory scrutiny than those operating in competitive markets.
In Europe, the Digital Markets Act (DMA) specifically targets large platforms designated as "gatekeepers" and explicitly requires that third-party app stores must be permitted. If Google were to use developer verification requirements in a manner that effectively prevented alternative stores from operating, European regulators would have grounds to intervene. The 2018 European Commission ruling against Google, which resulted in a €4.34 billion fine for abusing Android's market position through pre-installation requirements, established that Android's dominant position carries real obligations. That decision was largely upheld by the European courts in 2022.
In the United States, the Department of Justice has been pursuing separate antitrust cases relating to Google's search and advertising dominance, within which Android's role in channelling users toward Google services has been a recurring theme. The open letter's decision to copy regulators worldwide was not accidental. Its signatories have concluded that public documentation before enforcement begins creates pressure that private correspondence does not.
The key regulatory question is whether the verification requirements are genuinely necessary for security, and whether less restrictive measures could achieve the same goal. If the answer to either part of that question is no, regulators may conclude that the policy disproportionately disadvantages competing distribution channels.
What the Huawei and Amazon Cases Reveal
The importance of Google's service layer, and the difficulty of replicating it, can be understood by examining what happened when two large technology companies attempted to operate outside it. Here, we come to the experiences of Amazon and Huawei.
Amazon launched Fire OS in 2011, based on AOSP but with all Google components replaced by Amazon's own services. The platform succeeded in Fire tablets and streaming devices, where users primarily want access to Amazon's content. It failed entirely in smartphones. The Amazon Fire Phone, launched in 2014 and discontinued within a year, could not attract enough developer support to make it viable as a general-purpose device. The absence of Google Play Services meant that many popular applications were missing or required separate builds. This experience showed that Android's openness, at the operating system level, does not automatically translate into a competitive ecosystem. The real power lies in the service layer and the developer infrastructure built around it.
The Huawei case illustrates the same point more sharply. In May 2019, the United States government placed Huawei on its Entity List, restricting American firms from supplying technology to the company. Huawei had a 20 per cent global smartphone market share in 2019, which dropped to virtually zero after the restrictions took effect. Since Huawei could still use the AOSP codebase, the operating system was not the problem. The problem was Google Mobile Services. Without access to the Play Store, Google Maps, YouTube and the developer APIs that underpin much of the application ecosystem, Huawei phones became commercially unviable in international markets that expected those services.
Huawei's international smartphone market share, which had been among the top three, rapidly fell to outside the top five. The company's consumer business revenue declined by nearly 50 per cent in 2021. Huawei's subsequent efforts to build its own replacement ecosystem, Huawei Mobile Services and AppGallery, achieved limited success outside China, where the domestic mobile ecosystem already operates largely independently of Google. Both the Amazon and Huawei cases confirm that Android's formal openness does not neutralise Google's practical influence over the platform.
The Comparison With Apple
It is worth noting where the comparison with Apple, often invoked in these debates, holds and where it breaks down. Apple designs its hardware, controls its operating system, and has historically permitted application installation only through its App Store. That degree of vertical integration meant that, under the DMA, Apple faced requirements to allow alternative app marketplaces and sideloading mechanisms that represented fundamental changes to how iOS operates. Google already permits these behaviours on Android, which is why the DMA's impact on its platform is more limited.
However, the direction of travel matters. Critics argue that policies like mandatory developer verification, combined with Google's control of the update pipeline and the practical dependency of the ecosystem on Play Services, are gradually moving Android toward a model that is more controlled in practice than its open-source origins would suggest. The formal difference between Android and iOS may be narrowing, even if it has not disappeared.
Where Things Stand
The verification scheme opened to all developers in March 2026, with enforcement beginning in September 2026 in four initial countries. Google has offered assurances that sideloading is not being eliminated and that experienced users will retain a route to install unverified software. Critics point out that this route has not yet been specified clearly enough for independent organisations to assess whether it would serve as a workable mechanism for FOSS distribution. Until it is demonstrated and tested in practice, F-Droid and its allies have concluded that it cannot be relied upon.
F-Droid is not facing immediate closure. It continues to host over 3,800 applications and its governance and infrastructure have been strengthened in recent years. Its continued existence, and the existence of the broader ecosystem of independent Android distribution tools, depends on sideloading remaining practically viable. The outcome will be shaped by how Google implements the advanced flow provision, by the response of competition regulators in Europe and elsewhere, and by whether independent developers in sufficient numbers choose to comply with, work around or resist the new requirements.
Its story is, in this respect, a concrete test case for a broader question: whether the formal openness of a platform is sufficient to guarantee genuine openness in practice, or whether the combination of service dependencies, update mechanisms and registration requirements can produce a functionally closed system without formally becoming one. The answer will have implications well beyond a single FOSS app repository.
Lessons learned during migrations to Grav CMS from Textpattern
After the most of four years since the arrival of Textpattern 4.8.8, Textpattern 4.9.0 was released. Since, I was running two subsites using the software, I tried one of them out with the new version. That broke an elderly navigation plugin that no longer appears in any repository, prompting a rollback, which was successful. Even so, it stirred some curiosity about alternatives to this veteran of the content management system world, which is pushing on for twenty-two years of age. That might have been just as well, given the subsequent release of Textpattern 4.9.1 because of two reported security issues, one of which affecting all preceding versions of the software.
Well before that came to light, there had been a chat session in a Gemini app on a mobile which travelling on a bus. This started with a simple question about alternatives to Textpattern. The ensuing interaction led me to choose Grav CMS after one other option turned out to involve a subscription charge; A personal website side hustle generating no revenue was not going to become a more expensive sideline than it already was, the same reasoning that stops me paying for WordPress plugins.
Exploring Known Options
Without any recourse to AI capability, I already had options. While WordPress was one of those that was well known to me, the organisation of the website was such that it would be challenging to get everything placed under one instance and I never got around to exploring the multisite capability in much depth. Either way, it would prove to involve quite an amount of restructuring, Even having multiple instances would mean an added amount of maintenance, though I do automate things heavily. The number of attack surfaces because of database dependence is another matter.
In the past, I have been a user of Drupal, though its complexity and the steepness of the associated learning curve meant that I never exploited it fully. Since those were pre-AI days, I wonder how things would differ now. Nevertheless, the need to make parts of a website fit in with each other was another challenge that I failed to overcome in those days. Thus, this content framework approach was not one that I wished to use again. In short, this is an enterprise-grade tool that may be above the level of personal web tinkering, and I never did use its capabilities to their full extent.
The move away from Drupal brought me to Hugo around four years ago. That too presents a learning curve, though its inherent flexibility meant that I could do anything that I want with it once I navigated its documentation and ironed out oversights using web engine searches. This static website generator is what rebuilds a public transport website, a history website comprised of writings by my late father together with a website for my freelancing company. There is no database involved, and you can avoid having any dynamic content creation machinery on the web servers too. Using Git, it is possible to facilitate content publishing from anywhere as well.
Why Grav?
Of the lot, Hugo had a lot going for it. The inherent flexibility would not obstruct getting things to fit with a website wide consistency of appearance, and there is nothing to stop one structuring things how they wanted. However, Grav has one key selling point in comparison: straightforward remote production of content without recourse to local builds being uploaded to a web server. That decouples things from needing one to propagate the build machinery across different locations.
Like Hugo, Grav had an active project behind it and a decent supply of plugins and an architecture that bested Textpattern and its usual languid release cycle. The similarity also extended as far as not having to buy into someone else's theme: any theming can be done from scratch for consistency of appearance across different parts of a website. In Grav's case, that means using the Twig PHP templating engine, another thing to learn and reminiscent of Textpattern's Textile as much as what Hugo itself has.
The centricity of Markdown files was another area of commonality, albeit with remote editing. If you are conversant with page files having a combination of YAML front matter and subsequent page content from Hugo, Grav will not seem so alien to you, even if it has a web interface for editing that front matter. This could help if you need to edit the files directly for any reason.
That is never to say that there were no things to learn, for there was plenty of that. For example, it has its own way of setting up modular pages, an idea that I was to retrofit back into a Hugo website afterwards. This means care with module naming as well as caching, editor choice and content collections, each with their own uniqueness that rewards some prior reading. A learning journey was in the offing, a not attractive part of the experience in any event.
Considerations
There have been a number of other articles published here regarding the major lessons learned during the transitions from Textpattern to Grav. Unlike previous experiences with Hugo, another part of this learning was the use of AI as part of any debugging. At times, there was a need to take things step by step, interacting with the AI instead of trying out a script that it had put my way. There are times when one's own context window gets overwhelmed by the flow of text, meaning that such behaviour needs to be taken in hand.
Another thing to watch is that human consultation of the official documentation is not neglected in a quest for speed that lets the AI do that for you; after all, this machinery is fallible; nothing we ever bring into being is without its flaws. Grav itself also comes from a human enterprise that usefully includes its own Discord community. The GitHub repository was not something to which I had recourse, even if the Admin plugin interface has prompts for reporting issues on there. Here, I provide a synopsis of the points to watch that may add to the help provided elsewhere.
Choosing an Editor
By default, Grav Admin uses CodeMirror as its content editor. While CodeMirror is well-suited to editing code, offering syntax highlighting, bracket matching and multiple cursors, it renders its editing surface in a way that standard browser extension APIs cannot reach. Grammar checkers and spell-check extensions such as LanguageTool rely on native editable elements to detect text, and CodeMirror does not use these. The result is that browser-based writing tools produce no output in Grav Admin at all, which is a confirmed architectural incompatibility rather than a configuration issue documented in the LanguageTool issue tracker.
This can be addressed by replacing CodeMirror using the TinyMCE Editor Integration plugin, installable directly from the Admin plugin interface, which brings a familiar style of editor that browser extensions can access normally. Thus, LanguageTool functionality is restored, the writing workflow stays inside Grav Admin and the change requires only a small amount of configuration to prevent TinyMCE from interfering with Markdown files in undesirable ways. Before coming across the TinyMCE Editor plugin, I was seriously toying with the local editing option centred around a Git-based workflow. Here, using VS Code with the LanguageTool extension like I do for Hugo websites remained a strong possibility. The plugin means that the need to do this is not as pressing as it otherwise might be.
None of this appears to edit Twig templates and other configuration files unless one makes use of the Editor plugin. My brief dalliance with this revealed a clunky interface and interference with the appearance of the website, something that I never appreciated when I saw it with Drupal. Thus, the plugin was quickly removed, and I do not miss it. As it happened, editing and creating files over an SSH connection with a lightweight terminal editor worked well enough for me during the setup phase anyway. If I wanted a nicer editing experience, then a Git-based approach would allow local editing in VSCode before pushing the files back onto the server.
Grav Caching
Unlike WordPress, which requires plugins to do so, Grav maintains its own internal cache for compiled pages and assets. Learning to work with it is part of understanding the platform: changes to CSS, JavaScript and other static assets are served from this cache until it is refreshed. That can be accomplished using the admin panel or by removing the contents of the cache directory directly. Once this becomes second nature, it adds very little overhead to the development process.
Modular Pages
On one of the Textpattern subsites, I had set up the landing page in a modular fashion. This carried over to Grav, which has its own way of handling modular pages. There, the modular page system assembles a single page from the files found within a collection of child folders, each presenting a self-contained content block with its own folder, Markdown file and template.
All modules render together under a single URL; they are non-routable, meaning visitors cannot access them directly. When the parent folder contains a modular.md file, the name tells Grav to use the modular.html.twig template and whose front matter defines which modules to include and in what order.
Module folders are identified by an underscore at the start of their name, and numeric prefixes control the display sequence. The prefix must come before the underscore: _01.main is the correct form. For a home page with many sections this structure scales naturally, with folder names such as 01._title, 04._ireland or 13._practicalities-inspiration making the page architecture immediately readable from the file system alone.
Each module's Markdown filename determines which template renders it: a file named text.md looks for text.html.twig in the theme's modular templates folder. The parent modular.md assembles the modules using @self.modular to collect them, with a custom order list giving precise control over sequence. Once the folder naming convention and the template matching relationship are clear, the system is very workable.
Building Navigation
Given that the original impetus for leaving Textpattern was a broken navigation plugin, ensuring that Grav could replicate the required menu behaviour was a matter of some importance. Grav takes a different approach to navigation from database-driven systems, deriving its menu structure directly from the content directory tree using folder naming conventions and front matter flags rather than a dedicated menu editor.
Top-level navigation is driven by numerically prefixed subfolders within the content directory (pages), so a structure such as 01.home, 02.about and 03.blog yields an ordered working menu automatically. Visibility can be fine-tuned without renaming folders by setting visible: true or visible: false in a page's YAML front matter, and menu labels can be shortened for navigation purposes using the menu: field while retaining a fuller title for the page itself.
The primary navigation loop iterates over the visible children of the pages tree and uses the active and activeChild flags on each page object to highlight the current location, whether the visitor is on a given top-level page directly or somewhere within its subtree. A secondary menu for the current section is assembled by first identifying the active top-level page and then rendering its visible children as a list. Testing for activeChild as well as active in the secondary menu is important, as omitting it means that visitors to grandchild pages see no item highlighted at all. The approach differs from what was possible with Textpattern, where a single composite menu could drill down through the full hierarchy, but displaying one menu for pages within a given section alongside another showing the other sections proves to be a workable and context-sensitive alternative.
Setting Up RSS Feeds
Because Grav does not support the generation of RSS feeds out of the box, it needs a plugin and some extra configuration. The latter means that you need to get your head around the Grav concept of a collection because without it, you will not see anything in your feed. In contrast, database-driven platforms like WordPress or Drupal push out the content by default, which may mean that you are surprised when you first come across how Grav needs you to specify the collections explicitly.
There are two details that make performing configuration of a feed straightforward once understood. The first is that Grav routes do not match physical folder paths: a folder named 03.deliberations on disk is referenced in configuration as /deliberations, since the numeric prefix controls navigation ordering but does not appear in the route, that is the actual web page address. The second is the choice between @page.children, which collects only the immediate children of a folder, and @page.descendants, which collects recursively through all subdirectories. The collection definition belongs in the feed page's front matter, specifying where the content lives, how it should be ordered and in which direction.
Where All This Led
Once I got everything set in place, the end results were pleasing, with much learned along the way. Web page responsiveness was excellent, an experience enhanced by the caching of files. In the above discussion, I hardly mentioned the transition of existing content. For one subsite, this was manual because the scale was smaller, and the Admin plugin's interface made everything straightforward such that all was in place after a few hours of work. In the case of the other, the task was bigger, so I fell on an option used for a WordPress to Hugo migration: Python scripting. That greatly reduced the required effort, allowing me to focus on other things like setting up a modular landing page. The whole migration took around two weeks, all during time outside my client work. There are other places where I can use Grav, which surely will add to what I already have learned. My dalliance with Textpattern is feeling a little like history now.
Security is a behaviour, not a tick-box
Cybersecurity is often discussed in terms of controls and compliance, yet most security failures begin and end with human action. A growing body of practice now places behaviour at the centre, drawing on psychology, neuroscience, history and economics to help people replace old habits with new ones. George Finney's Well Aware Security have built its entire approach around this idea, reframing awareness training as a driver of measurable outcomes rather than a box-ticking exercise, with coaches helping colleagues identify and build upon their existing strengths. It is also personal by design, using insights about how minds work to guide change one habit at a time rather than expecting wholesale transformation overnight.
This emphasis on behaviour is not a dismissal of technical skill so much as a reminder that skill alone is insufficient. Security is not a competency you either possess or lack; it is a behaviour that can be learned, reinforced and normalised. As social beings, we have always gathered for mutual protection, meaning the desire to contribute to collective security is already present in most people. Turning that impulse into daily action requires structure and patience, and it thrives when a supportive culture takes root.
Culture matters because norms are powerful. In a team where speed and convenience consistently override prudence, individuals who try to do the right thing can feel isolated. Conversely, when an organisation embraces cybersecurity at every level, a small group can create sufficient leverage to shift practices across the whole business. Research has found that organisations with below-average culture ratings are significantly more likely to experience a data breach than their peers, and controls alone cannot close that gap when behaviours are pulling in the opposite direction.
This is why advocates of habit-based security speak of changing one step at a time, celebrating progress and maintaining momentum. The same thinking underpins practical measures at home and at work, where small changes in how devices and data are managed can reduce risk materially without making technology difficult to use.
Network-Wide Blocking with Pi-hole
One concrete example of this approach is network-wide blocking of advertising and tracking domains using a DNS sinkhole. Pi-hole has become popular because it protects all devices on a network without requiring any client-side software to be installed on each one. It runs lightly on Linux, blocks content outside the browser (such as within mobile apps and smart TVs) and can optionally act as a DHCP server so that new devices are protected automatically upon joining the network.
Pi-hole's web dashboard surfaces insights into DNS queries and blocked domains, while a command-line interface and an API offer further control for those who need it. It caches DNS responses to speed up everyday browsing, supports both IPv4 and IPv6, and scales from small households to environments handling very high query volumes. The project is free and open source, sustained by donations and volunteer effort.
Choosing What to Block
Selecting what to block is a point at which behaviour and technology intersect. It is tempting to load every available blocklist in the hope of maximum protection, but as Avoid the Hack notes in its detailed guide to Pi-hole blocklists, more is not always better. Many lists draw from common sources, so stacking them can add redundancy without improving coverage and may increase false positives (instances where legitimate sites are mistakenly blocked).
The most effective approach begins by considering what you want to block and why, then balancing that against the requirements of your devices and services. Blocking every Microsoft domain, for instance, could disrupt operating system updates or break websites that rely on Azure. Likewise, blacklisting all domains belonging to a streaming or gaming platform may render apps unusable. Aggressive configurations are possible, but they work best when paired with careful allow-listing of domains essential to your services. Allow lists require ongoing upkeep as services move or change, so they are not a one-off exercise.
Recommended Blocklists
A practical starting point is the well-maintained Steven Black unified hosts file, which consolidates several reputable sources and many users find sufficient straight away. From there, curated collections help tailor coverage further. EasyList provides a widely trusted foundation for blocking advertising and integrates with browser extensions such as uBlock Origin, while its companion list EasyPrivacy can add stronger tracking protection at the cost of occasional breakage on certain sites.
Hagezi maintains a comprehensive set of DNS blocklists, including "multi" variants of different sizes and aggression levels, built from numerous sources. Selecting one of the multi variants is usually preferable to layering many individual category lists, which can reintroduce the overlap you were trying to avoid. Firebog organises its lists by risk: green entries carry a lower risk of causing breakage, while blue entries are more aggressive, giving you the option to mix and match according to your comfort level.
Some projects bundle many sources into a single combination list. OISD is one such option, with its Basic variant focusing mainly on advertisements, Full extending to malware, scams, phishing, telemetry and tracking, and a separate NSFW set covering adult content. OISD updates roughly every 24 hours and is comprehensive enough that many users would not need additional lists. The trade-off is placing a significant degree of trust in a single maintainer and limiting the ability to assign different rule sets to different device groups within Pi-hole, so it is worth weighing convenience against flexibility before committing.
The Blocklist Project organises themed lists covering advertising, tracking, malware, phishing, fraud and social media domains, and these work with both Pi-hole and AdGuard Home. The project completed a full rebuild of its underlying infrastructure, replacing an inconsistent mix of scripts with a properly tested Python pipeline, automated validation on pull requests and a cleaner build process.
Existing list URLs are unchanged, so anyone already using the project's lists need not reconfigure anything. That said, the broader principle holds regardless of which project you use: blocklists can become outdated if not actively maintained, reducing their effectiveness over time.
Using Regular Expressions
For more granular control, Pi-hole supports regular expressions to match domain patterns. Regex entries are powerful and can be applied both to block and to allow traffic, but they reward specificity. Broad patterns risk false positives that break legitimate services, so community-maintained regex recommendations are a safer starting point than writing everything from scratch. Pi-hole's own documentation explains how expressions are evaluated in detail. Used judiciously, regex rules extend what list-based blocking can achieve without turning maintenance into an ongoing burden.
Installing Pi-hole
Installation is straightforward. Pi-hole can be deployed in a Linux container or directly on a supported operating system using an automated installer that asks a handful of questions and configures everything in under ten minutes. Once running, you point clients to use it as their DNS resolver, either by setting DHCP options on your router, so devices adopt it automatically, or by updating network settings on each device individually. Pairing Pi-hole with a VPN extends ad blocking to mobile devices when away from home, so limited data plans go further by not downloading unwanted content. Day-to-day management is handled via the web interface, where you can add domains to block or allow lists, review query logs, view long-term statistics and audit entries, with privacy modes that can be tuned to your environment.
Device-Level Adjustments
Network filtering is one layer in a defence-in-depth approach, and a few small device-level changes can reduce friction without sacrificing safety. Bitdefender's Safepay, for example, is designed to isolate banking and shopping sessions within a hardened browser environment. If its prompts become intrusive, you can turn off notifications by opening the Bitdefender interface, selecting Privacy, then Safepay settings, and toggling off both Safepay notifications and the option to use a VPN with Safepay. Bookmarked sites can still auto-launch Safepay unless you also disable the automatic-opening option. Even with notifications suppressed, you can start Safepay manually from the dashboard whenever you want the additional protection.
On Windows, unwanted prompts from Microsoft Edge about setting it as the default browser can be handled without resorting to arcane steps. The Windows Club covers the full range of methods available. Dismissing the banner by clicking "Not now" several times usually prevents it from reappearing, though a browser update or reset may bring the message back. Advanced users can disable the recommendations via edge://flags, or apply a registry policy under HKEY_CURRENT_USERSoftwarePoliciesMicrosoftEdge by setting DefaultBrowserSettingEnabled to 0. In older environments such as Windows 7, a Group Policy setting exists to stop Edge checking whether it is the default browser. These changes should be made with care, particularly in managed environments where administrators enforce default application associations across the estate.
Knowing What Your Devices Reveal
Awareness also begins with understanding what your devices reveal to the wider internet. Services like WhatIsMyIP.com display your public IP address, the approximate location derived from it and your internet service provider. For most home users, a public IP address is dynamic rather than fixed, meaning it can change when a router restarts or when an ISP reallocates addresses; on mobile networks it may change more frequently still as devices move between towers and routing systems.
Such tools also provide lookups for DNS and WHOIS information, and they explain the difference between public and private addressing. Complementary checks from WhatIsMyBrowser.com summarise your browser version, whether JavaScript and cookies are enabled, and whether known trackers or ad blockers are detected. Sharing that information with support teams can make troubleshooting considerably faster, since it quickly narrows down where problems are likely to sit.
Protecting Your Accounts
Checking for Breached Credentials
Account security is another area where habits do most of the heavy lifting. Checking whether your email address appears in known data breaches via Have I Been Pwned helps you decide when to change passwords or enable stronger protections. The service, created by security researcher Troy Hunt, tracks close to a thousand breached websites and over 17.5 billion compromised accounts, and offers notifications as well as a searchable dataset. Finding your address in a breach does not mean your account has been taken over, but it does mean you should avoid reusing passwords and should enable two-factor authentication wherever it is available.
Two-Factor Authentication
Authenticator apps provide time-based codes that attackers cannot guess, even when armed with a reused password. Aegis Authenticator is a free, open-source option for Android that stores your tokens in an encrypted vault with optional biometric unlock. It offers a clean interface with multiple themes, supports icons for quick identification and allows import and export from a wide range of other apps. Backups can be automatic, and you remain in full control, since the app works entirely offline without advertisements or tracking.
For users who prefer cloud backup and multi-device synchronisation, Authy from Twilio offers a popular alternative that pairs straightforward setup with secure backup and support for using tokens across more than one device. Both approaches strengthen accounts significantly, and the choice often comes down to whether you value local control above all else or prefer the convenience of synchronisation.
Password Management
Strong, unique passwords remain essential even alongside two-factor authentication. KeePassXC is a cross-platform password manager for Windows, macOS and Linux that keeps your credentials in an encrypted database stored wherever you choose, rather than on a vendor's servers. It is free and open source under the GPLv3 licence, and its development process is publicly visible on GitHub.
The project has undergone rigorous external scrutiny. On the 17th of November 2025, KeePassXC version 2.7.9 was awarded a Security Visa by the French National Cybersecurity Agency (ANSSI) under its First-level Security Certification (CSPN) programme, with report number ANSSI-CSPN-2025/16. The certification is valid for three years and is recognised in France and by the German Federal Office for Information Security. More recent releases such as version 2.7.11 focus on bug fixes and usability improvements, including import enhancements, better password-generation feedback and refinements to browser integration. Because data are stored locally, you can place the database in a private or shared cloud folder if you wish to sync between devices, while encryption remains entirely under your control.
Secure Email with Tuta
Email is a frequent target for attackers and a common source of data leakage, so the choice of client can make a meaningful difference. Tuta provides open-source desktop applications for Linux, Windows and macOS that bring its end-to-end encrypted mail and calendar to the desktop with features that go beyond the web interface. The clients are signed so that updates can be verified independently, and Tuta publishes its public key, so users can confirm signatures themselves.
There is a particular focus on Linux, with support for major distributions including Ubuntu, Debian, Fedora, Arch Linux, openSUSE and Linux Mint. Deep operating-system integration enables conveniences such as opening files as attachments directly from context menus on Windows via MAPI, setting Tuta as the default mail handler, using the system's secret storage and applying multi-language spell-checking. Hardware key support via U2F is available across all desktop clients, and offline mode means previously indexed emails, calendars and contacts remain accessible without an internet connection.
Tuta does not support IMAP because downloading and storing messages unencrypted on devices would undermine its end-to-end encryption model. Instead, features such as import and export are built directly into the clients; paid plans including Legend and Unlimited currently include email import that encrypts messages locally before uploading them. The applications are built on Electron to maintain feature parity across platforms, and Tuta offers the desktop clients free to all users to ensure that core security benefits are not gated behind a subscription.
Bringing Culture and Tooling Together
These individual strands reinforce one another when combined. A network-wide blocker reduces exposure to malvertising and tracking while nudging everyone in a household or office towards safer defaults. Small device-level settings cut noise without removing protection, which helps people maintain good habits because security becomes less intrusive. Visibility tools demystify what the internet can see and how browsers behave, which builds confidence. Password managers and authenticator apps make strong credentials and second factors the norm rather than the exception, and a secure email client protects communications by default.
None of these steps requires perfection, and each can be introduced one at a time. The key is to focus on outcomes, think like a coach and make security personal, so that habits take root and last.
There is no single fix that will stop every attack. One approach that does help is consistent behaviour supported by thoughtful choices of software and services. Start with one change that removes friction while adding protection, then build from there. Over time, those choices shape a culture in which people feel they have a genuine role in keeping themselves and their organisations safe, and the technology they rely upon reflects that commitment.
Running local Large Language Models on desktop computers and workstations with 8GB VRAM
Running large language models locally has shifted from being experimental to practical, but expectations need to match reality. A graphics card with 8 GB of VRAM can support local workflows for certain text tasks, though the results vary considerably depending on what you ask the models to do.
Understanding the Hardware Foundation
The Critical Role of VRAM
The central lesson is that VRAM is the engine of local performance on desktop systems. Whilst abundant system RAM helps avoid crashes and allows larger contexts, it cannot replace VRAM for throughput.
Models that fit in VRAM and keep most of their computation on the GPU respond promptly and maintain a steady pace. Those that overflow to system RAM or the CPU see noticeable slowdowns.
Hardware Limitations and Thresholds
On a desktop GPU with 8 GB of VRAM, this sets a practical ceiling. Models in the 7 billion to 14 billion parameter range fit comfortably enough to exploit GPU acceleration for typical contexts.
Much larger models tend to offload a significant portion of the work to the CPU. This shows up as pauses, lower token rates and lag when prompts become longer.
Monitoring GPU Utilisation
GPU utilisation is a reliable way to gauge whether a setup is efficient. When the GPU is consistently busy, generation is snappy and interactive use feels smooth.
A model like llama3.1:8b can run almost entirely on the GPU at a context length of 4,096 tokens. This translates into sustained responsiveness even with multi-paragraph prompts.
Model Selection and Performance
Choosing the Right Model Size
A frequent instinct is to reach for the largest model available, but size does not equal usefulness when running locally on a desktop or workstation. In practice, models between 7B and 14B parameters are what you can run on this class of hardware, though what they do well is more limited than benchmark scores might suggest.
What These Models Actually Do Well
Models in this range handle certain tasks competently. They can compress and reorganise information, expand brief notes into fuller text, and maintain a reasonably consistent voice across a draft. For straightforward summarisation of documents, reformatting content or generating variations on existing text, they perform adequately.
Where things become less reliable is with tasks that demand precision or structured output. Coding tasks illustrate this gap between benchmarks and practical use. Whilst llama3.1:8b scores 72.6% on the HumanEval benchmark (which tests basic algorithm problems), real-world coding tasks can expose deeper limitations.
Commit message generation, code documentation and anything requiring consistent formatting produce variable results. One attempt might give you exactly what you need, whilst the next produces verbose or poorly structured output.
The gap between solving algorithmic problems and producing well-formatted, professional code output is significant. This inconsistency is why larger local models like gpt-oss-20b (which requires around 16GB of memory) remain worth the wait despite being slower, even when the 8GB models respond more quickly.
Recommended Models for Different Tasks
Llama3.1:8b handles general drafting reasonably well and produces flowing output, though it can be verbose. Benchmark scores place it above average for its size, but real-world use reveals it is better suited to free-form writing than structured tasks.
Phi3:medium is positioned as stronger on reasoning and structured output. In practice, it can maintain logical order better than some alternatives, though the official documentation acknowledges quality of service limitations, particularly for anything beyond standard American English. User reports also indicate significant knowledge gaps and over-censorship that can affect practical use.
Gemma3 at 12B parameters produces polished prose and smooths rough drafts effectively when properly quantised. The Gemma 3 family offers models from 1B to 27B parameters with 128K context windows and multimodal capabilities in the larger sizes, though for 8GB VRAM systems you are limited to the 12B variant with quantisation. Google also offers Gemma 3n, which uses an efficient MatFormer architecture and Per-Layer Embedding to run on even more constrained hardware. These are primarily optimised for mobile and edge devices rather than desktop use.
Very large models remain less efficient on desktop hardware with 8 GB VRAM. Attempting to run them results in heavy CPU offloading, and the performance penalty can outweigh any quality improvement.
Memory Management and Configuration
Managing Context Length
Context length sits alongside model size as a decisive lever. Every extra token of context demands memory, so doubling the window is not a neutral choice.
At around 4,096 tokens, most of the well-matched models stay predominantly on the GPU and hold their speed. Push to 8,192 or beyond, and the memory footprint swells to the point where more of the computation ends up taking place on the CPU and in system RAM.
Ollama's Keep-Alive Feature
Ollama keeps already loaded models resident in VRAM for a short period after use so that the next call does not pay the penalty of a full reload. This is expected behaviour and is governed by a keep_alive parameter that can be adjusted to hold the model for longer if a burst of work is coming, or to release it sooner when conserving VRAM matters.
Practical Memory Strategies
Breaking long jobs into a series of smaller, well-scoped steps helps both speed and stability without constraining the quality of the end result. When writing an article, for instance, it can be more effective to work section by section rather than asking for the entire piece in one pass.
Optimising the Workflow
The Benefits of Streaming
Streaming changes the way output is experienced, rather than the content that is ultimately produced. Instead of waiting for a block of text to arrive all at once, words appear progressively in the terminal or application. This makes longer pieces easier to manage and revise on the fly.
Task-Specific Model Selection
Because each model has distinct strengths and weaknesses, matching the tool to the task matters. A fast, GPU-friendly model like llama3.1:8b works for general writing and quick drafting where perfect accuracy is not critical. Phi3:medium may handle structured content better, though it is worth testing against your specific use case rather than assuming it will deliver.
Understanding Limitations
It is important to be clear about what local models in this size range struggle with. They are weak at verifying facts, maintaining strict factual accuracy over extended passages, and providing up-to-date knowledge from external sources.
They also perform inconsistently on tasks requiring precise structure. Whilst they may pass coding benchmarks that test algorithmic problem-solving, practical coding tasks such as writing commit messages, generating consistent documentation or maintaining formatting standards can reveal deeper limitations. For these tasks, you may find yourself returning to larger local models despite preferring the speed of smaller ones.
Integration and Automation
Using Ollama's Python Interface
Ollama integrates into automated workflows on desktop systems. Its Python package allows calls from scripts to automate summarisation, article generation and polishing runs, with streaming enabled so that logs or interfaces can display progress as it happens. Parameters can be set to control context size, temperature and other behavioural settings, which helps maintain consistency across batches.
Building Production Pipelines
The same interface can be linked into website pipelines or content management tooling, making it straightforward to build a system that takes notes or outlines, expands them, revises the results and hands them off for publication, all locally on your workstation. The same keep_alive behaviour that aids interactive use also benefits automation, since frequently used models can remain in VRAM between steps to reduce start-up delays.
Recommended Configuration
Optimal Settings for 8 GB VRAM
For a desktop GPU with 8 GB of VRAM, an optimal configuration builds around models that remain GPU-efficient whilst delivering acceptable results for your specific tasks. Llama3.1:8b, phi3:medium and gemma3:12b are the models that fit this constraint when properly quantised, though you should test them against your actual workflows rather than relying on general recommendations.
Performance Monitoring
Keeping context windows around 4,096 tokens helps sustain GPU-heavy operation and consistent speeds, whilst streaming smooths the experience during longer outputs. Monitoring GPU utilisation provides an early warning if a job is drifting into a configuration that will trigger CPU fallbacks.
Planning for Resource Constraints
If a task does require more memory, it is worth planning for the associated slowdown rather than assuming that increasing system RAM or accepting a bigger model will compensate for the VRAM limit. Tuning keep_alive to the rhythm of work reduces the frequency of reloads during sessions and helps maintain responsiveness when running sequences of related prompts.
A Practical Content Creation Workflow
Multi-Stage Processing
This configuration supports a division of labour in content creation on desktop systems. You start with a compact model for rapid drafting, switch to a reasoning-focused option for structured expansions if needed, then finish with a model known for adding polish to refine tone and fluency. Insert verification steps between stages to confirm facts, dates and citations before moving on.
Because each stage is local, revisions maintain privacy, with minimal friction between idea and execution. When integrated with automation via Ollama's Python tools, the same pattern can run unattended for batches of articles or summaries, with human review focused on accuracy and editorial style.
In Summary
Desktop PCs and workstations with 8 GB of VRAM can support local LLM workflows for specific tasks, though you need realistic expectations about what these models can and cannot do reliably. They handle basic text generation and reformatting, though prone to hallucinations and misunderstandings. They struggle with precision tasks, structured output and anything requiring consistent formatting. Whilst they may score well on coding benchmarks that test algorithmic problem-solving, practical coding tasks can reveal deeper limitations.
The key is to select models that fit the VRAM envelope, keep context lengths within GPU-friendly bounds, and test them against your actual use cases. For tasks where local models prove inconsistent, such as generating commit messages or producing reliably structured output, larger local models like gpt-oss-20b (which requires around 16GB of memory) may still be worth the wait despite being slower. Local LLMs work best when you understand their limitations and use them for what they genuinely do well, rather than expecting them to replace more capable models across all tasks.
Additional Resources
- Ollama Official Documentation
- Ollama GitHub Repository
- Hugging Face model hub
- LM Studio for local model management
- Jan AI for local AI deployment
- Meta Llama 3.1 Model Card
- Microsoft Phi-3 Documentation
- Google Gemma 3 Overview
- Google Gemma 3n Overview
- Artificial Analysis for model benchmarks