Literate programming | Technology Tales

Open Source Tools for Pharmaceutical Clinical Data Reporting, Analysis & Regulatory Submissions

25^th March 2026

There was a time when SAS was the predominant technology for clinical data reporting, analysis and submission work in the pharmaceutical industry. Within the last decade, open-source alternatives have gained a lot of traction, and the {pharmaverse} initiative has arisen from this. The range of packages ranges from dataset creation (SDTM and ADaM) to output production, with utilities for test data and submission activities. All in all, there is quite a range here. The effort also is a marked change from each company working by itself to sharing and collaborating with others. Here then is the outcome of their endeavours.

{admiral}

Designed as an open-source, modular R toolbox, the {admiral} package assists in the creation of ADaM datasets through reusable functions and utilities tailored for pharmaceutical data analysis. Core packages handle general ADaM derivations whilst therapeutic area-specific extensions address more specialised needs, with a structured release schedule divided into two phases. Usability, simplicity and readability are central priorities, supported by comprehensive documentation, vignettes and example scripts. Community contributions and collaboration are actively encouraged, with the aim of fostering a shared, industry-wide approach to ADaM development in R. Related packages for test data and metadata manipulation complement the main toolkit, alongside a commitment to consistent coding practices and accessible code.

{aNCA}

Maintained by contributors from F. Hoffmann-La Roche AG, {aNCA} is an open-source R Shiny application that makes Non-Compartmental Analysis (NCA) accessible to scientists working with clinical and pre-clinical pharmacokinetic datasets. Users can upload their own data, apply pre-processing filters and run NCA with configurable options including half-life calculation rules, manual slope selection and user-defined AUC intervals. Results are explorable through interactive box plots, scatter plots and summary statistics tables, and can be exported in `PP` and `ADPP` dataset domains alongside a reproducible R script. Analysis settings can be saved and reloaded for continuity across sessions. Installation is available from CRAN via a standard install command, from GitHub using the `pak` package manager, or by cloning the repository directly for those wishing to contribute.

{autoslider.core}

The {autoslider.core} package generates standard table templates commonly used in Study Results Endorsement Plans. Its principal purpose is to reduce duplicated effort between statisticians and programmers when creating slides. Available on CRAN, the package can be installed either through the standard installation method or directly from GitHub for the latest development version.

{cards}

Supporting the CDISC Analysis Results Standard, the {cards} package facilitates the creation of analysis results data sets that enhance automation, reproducibility and consistency in clinical research. Structured data sets for statistical summaries are generated to enable tasks such as quality control, pre-calculating statistics for reports and combining results across studies. Tools for creating, modifying and analysing these data sets are provided, with the {cardx} extension offering additional functions for statistical tests and models. Installation is available through CRAN or GitHub, with resources including documentation and community contributions.

{cardx}

Extending the {cards} package, {cardx} facilitates the creation of Analysis Results Data Objects (ARD's) in R by leveraging utility functions from {cards} and statistical methods from packages such as {stats} and {emmeans}. These ARD's enable the generation of tables and visualisations for regulatory submissions, support quality control checks by storing both results and parameters, and allow for reproducible analyses through the inclusion of function inputs. Installation options include CRAN and GitHub, with examples demonstrating its use in t-tests and regression models. External statistical library dependencies are not enforced by the package, requiring explicit references in code for tools like {renv} to track them.

{chevron}

A collection of high-level functions for generating standardised outputs in clinical trials reporting, {chevron} covers a broad range of output types including tables for safety summaries, adverse events, demographics, ECG results, laboratory findings, medical history, response data, time-to-event analyses and vital signs, as well as listings and graphs such as Kaplan-Meier and mean plots. Straightforward implementation with limited parameterisation is a defining characteristic of the package. It is available on CRAN, with a development version accessible via GitHub, and those requiring greater flexibility are directed to the related {tern} package and its associated catalogue.

{clinify}

Built on the {flextable} and {officer} packages, {clinify} streamlines the creation of clinical tables, listings and figures whilst addressing challenges such as adherence to organisational reporting standards, the need for flexibility across different clients and the importance of reusable configurations. Compatibility with existing tools is a key priority, ensuring that its features do not interfere with the core functionalities of {flextable} or {officer}, whilst enabling tasks like dynamic page breaks, grouped headers and customisable formatting. Complex documents such as Word files with consistent layouts and tailored elements like footnotes and titles can be produced with reduced effort by building on these established frameworks.

{connector}

Offering a unified interface for establishing connections to various data sources, the {connector} package covers file systems and databases through a central configuration file that maintains consistent references across project scripts and facilitates switching between data sources. Functions such as connector_fs() for file system access and connector_dbi() for database connections are provided, with additional expansion packages enabling integration with specific platforms like Databricks and SharePoint. Installation is available via CRAN or GitHub, and usage involves defining a YAML configuration file to specify connection details that can then be initialised and utilised to interact with data sources. Operations including reading, writing and listing content are supported, with methods for managing connections and handling data in formats like parquet.

{covtracer}

Linking test traces to package code and documentation using coverage data from {covr}, the {covtracer} package enables the creation of a traceability matrix that maps tests to specific documented functions. Installation is via remotes from GitHub with specific dependencies, and configuration of {covr} is required to record tests alongside coverage traces. Untested behaviours can be identified and the direct testing of functions assessed, providing insights into test coverage and software validation. The example workflow demonstrates generating a matrix to show which tests evaluate code related to documented behaviours, highlighting gaps in test coverage.

{datacutr}

An open-source solution for applying data cuts to SDTM datasets within R, the {datacutr} package is designed to support pharmaceutical data analysis workflows. Available via CRAN or GitHub, it offers options for different types of cuts tailored to specific SDTM domains. Supplemental qualifiers are assumed to be merged with their parent domain before processing, allowing users flexibility in defining cut types such as patient, date, or domain-specific cuts. Documentation, contribution guidelines and community support through platforms like Slack and GitHub provide further assistance.

{datasetjson}

Facilitating the creation and manipulation of CDISC Dataset JSON formatted datasets, the {datasetjson} R package enables users to generate structured data files by applying metadata attributes to data frames. Metadata such as file paths, study identifiers and system details can be incorporated into dataset objects and written to disk or returned as JSON text. Reading JSON files back into data frames is also supported, with metadata preserved as attributes for use in analysis. The package currently supports version 1.1.0 of the Dataset JSON standard and is available via CRAN or GitHub.

{dataviewR}

An interactive data viewer for R, {dataviewR} enhances data exploration through a Shiny-based interface that enables users to examine data frames and tibbles with tools for filtering, column selection and generating reproducible {dplyr} code. Viewing multiple datasets simultaneously is supported, and the tool provides metadata insights alongside features for importing and exporting data, all within a responsive and user-friendly design. By combining intuitive navigation with automated code generation, the package aims to streamline data analysis workflows and improve the efficiency of dataset manipulation and documentation.

{docorator}

Generating formatted documents by adding headers, footers and page numbers to displays such as tables and figures, {docorator} exports outputs as PDF or RTF files. Accepted inputs include tables created with the {gt} package, figures generated using {ggplot2}, or paths to existing PNG files, and users can customise document elements like titles and footers. The package can be installed from CRAN or via GitHub, and its use involves creating a display object with specified formatting options before rendering the output. LaTeX libraries are required for PDF generation.

{envsetup}

Providing a configuration system for managing R project environments, the {envsetup} package enables adaptation to different deployment stages such as development, testing and production without altering code. YAML files are used to define paths for data and output directories, and R scripts are automatically sourced from specified locations to reduce the need for manual configuration changes. This approach supports consistent code usage across environments whilst allowing flexibility in environment-specific settings, streamlining workflows for projects requiring multiple deployment contexts.

{ggsurvfit}

Simplifying the creation of survival analysis visualisations using {ggplot2}, the {ggsurvfit} package offers tools to generate publication-ready figures with features such as confidence intervals, risk tables and quantile markers. Seamless integration with {ggplot2} functions allows for extensive customisation of plot elements whilst maintaining alignment between graphical components and annotations. Competing risks analysis is supported through `ggcuminc()`, and specific functions such as Surv_CNSR() handle CDISC ADaM `ADTTE` data by adjusting event coding conventions to prevent errors. Installation options are available via CRAN or GitHub, with examples and further resources accessible through its documentation and community links.

{gridify}

Addressing challenges in creating consistent and customisable graphical arrangements for figures and tables, the {gridify} package leverages the base {grid} package to facilitate the addition of headers, footers, captions and other contextual elements through predefined or custom layouts. Multiple input types are supported, including {ggplot2}, {flextable} and base R plots, and the workflow involves generating an object, selecting a layout and using functions to populate text elements before rendering the final output. Installation options include CRAN and GitHub, with examples demonstrating its application in enhancing tables with metadata and formatting. Uniformity across different projects is promoted, reducing manual adjustments and aligning visual elements consistently.

{gtsummary}

Offering a streamlined approach to generating publication-quality analytical and summary tables in R, the {gtsummary} package enables users to summarise datasets, regression models and other statistical outputs with minimal code. Variable types are identified automatically, relevant descriptive statistics computed and measures of data incompleteness included, whilst customisation of table formatting such as adjusting labels, adding p-values or merging tables for comparative analysis is also supported. Integration with packages like {broom} and {gt} facilitates the creation of visually appealing tables, and results can be exported to multiple formats including HTML, Word and LaTeX, making the package suitable for reproducible reporting in academic and professional contexts.

{logrx}

Supporting logging in clinical programming environments, the {logrx} package generates detailed logs for R scripts, ensuring code execution is traceable and reproducible. An overview of script execution and the associated environment is provided, enabling users to recreate conditions for verification or further analysis. Available on CRAN, installation is possible via standard methods or from its development repository, offering flexibility for both file-based and scripted usage. Structured logging tailored to the specific requirements of clinical applications is the defining characteristic of the package, with simplicity and minimal intrusion in coding workflows maintained throughout.

{metacore}

Providing a standardised framework for managing metadata within R sessions, the {metacore} package is particularly suited to clinical trial data analysis. Metadata is organised into six interconnected tables covering dataset specifications, variable details, value definitions, derivations, code lists and supplemental information, ensuring consistency and ease of access. By centralising metadata in a structured, immutable format, the package facilitates the development of tools that can leverage this information across different workflows, reducing the need for redundant data structures. Reading metadata from various sources, including Define-XML 2.0, is also supported.

{metatools}

Working with {metacore} objects, {metatools} enables users to build datasets, enhance columns in existing datasets and validate data against metadata specifications. Installation is available from CRAN or via GitHub. Core functionality includes pulling columns from existing datasets, creating new categorical variables, converting columns to factors and running checks to verify that data conforms to control terminology and that all expected variables are present.

{pharmaRTF}

Developed to address gaps in RTF output capabilities within R, {pharmaRTF} is a package for pharmaceutical industry programmers who produce RTF documents for clinical trial data analysis. Whilst the {huxtable} package offers extensive RTF styling and formatting options, it lacks the ability to set document properties such as page size and orientation, repeat column headers across pages, or create multi-level titles and footnotes within document headers and footers. These limitations are resolved by {pharmaRTF}, which wraps around {huxtable} tables to provide document property controls, proper multipage display and title and footnote management within headers and footers. Two core objects form the basis of the package: rtf_doc for document-wide attributes and hf_line for creating individual title and footnote lines, each carrying formatting properties such as alignment, font and bold or italic styling. Default output files use Courier New at 12-point size, Letter page dimensions in landscape orientation with one-inch margins, though all of these can be adjusted through property functions. The package is available on CRAN and supports both a {tidyverse} piping style and a more traditional assignment-based coding approach.

{pharmaverseadam}

Serving as a repository for ADaM test datasets generated by executing templates from related packages such as {admiral} and its extensions, the {pharmaverseadam} package automates dataset creation through a script that installs required packages, runs templates and saves results. Metadata is managed centrally in an XLSX file to ensure consistency in documentation, and updates occur regularly or ad-hoc when templates change. Documentation is generated automatically from metadata and saved as `.R` files, and the package includes contributions from multiple developers with examples provided for each dataset. Preparing metadata, updating configuration files for new therapeutic areas and executing a script to generate datasets and documentation ensures alignment with the latest versions of dependent packages. Installation is available via CRAN or GitHub.

{pharmaverseraw}

Providing raw datasets to support the creation of SDTM datasets, the {pharmaverseraw} package includes examples that are independent of specific electronic data capture systems or data standards such as CDASH. Datasets are named using SDTM domain identifiers with the suffix _raw, and installation options include CRAN or direct GitHub access. Updates involve contributing via GitHub issues, generating new or modified datasets through standalone R scripts stored in the data-raw folder, and ensuring generated files are saved in the data folder as .rda files with consistent naming. Documentation is maintained in R/*.R files, and changes require updating `NAMESPACE` and `.Rd` files using devtools::document.

{pharmaversesdtm}

A collection of test datasets formatted according to the SDTM standard, the {pharmaversesdtm} package is designed for use within the pharmaverse family of packages. Datasets applicable across therapeutic areas, such as `DM` and VS, are included alongside those specific to particular areas, like `RS` and `OE`. Available via CRAN and GitHub, the package provides installation instructions for both stable and development versions, with test data sourced from the CDISC pilot project and ad-hoc datasets generated by the {admiral} team. Naming conventions distinguish between general and therapeutic area-specific categories, with examples such as dm for general use and rs_onco for oncology-specific data. Updates involve creating or modifying R scripts in the data-raw folder, generating `.rda` files and updating metadata in a central JSON file to automate documentation and maintain consistency, including specifying dataset details like labels, descriptions and therapeutic areas.

{pkglite} (R)

Converting R package source code into text files and reconstructing package structures from those files, {pkglite} enables the exchange and management of R packages as plain text. Single or multiple packages can be processed through functions that collate, pack and unpack files, with installation options available via CRAN or GitHub. The tool adheres to a defined format for text files and includes documentation for generating specifications and managing file collections.

{pkglite} (Python)

An open-source framework licensed under the MIT licence, {pkglite} for Python, allows source projects written in any programming language to be packed into portable files and restored to their original directory structure. Installation is available via PyPI or as a development version cloned from GitHub, and the package can also be run without installation using `uvx`. A command line interface is provided in addition to the Python API, which can be installed globally using `pipx`.

{rhino}

Streamlining the development of high-quality, enterprise-grade Shiny applications, {rhino} integrates software engineering best practices, modular code structures and robust testing frameworks. Scalable architecture is supported through modularisation, code quality is enhanced with unit and end-to-end testing, and automation is facilitated via tools for project setup, continuous integration and dependency management. Comprehensive documentation is divided into tutorials, explanations and guides, with examples and resources available for learning.

{risk.assessr}

Evaluating the reliability and security of R packages during validation, the {risk.assessr} package analyses maintenance, documentation and dependencies through metrics such as R CMD check results, unit test coverage and dependency assessments. A traceability matrix linking functions to tests is generated, and risk profiles are based on predefined thresholds including documentation completeness, licence type and code coverage. The tool supports installation from GitHub or CRAN, processes local package files or `renv.lock` dependencies and offers detailed outputs such as risk analysis, dependency lists and reverse dependency information. Advanced features include identifying potential issues in suggested package dependencies and generating HTML reports for risk evaluation, with applications in clinical trial workflows and package validation processes.

{riskassessment}

Built on the {riskmetric} framework, the {riskassessment} application offers a user-friendly interface for evaluating the risk of using R packages within regulated industries, assessing development practices, documentation and sustainability. Non-technical users can review {riskmetric} outputs, add personalised comments, categorise packages into risk levels, generate reports and store assessments securely, with features such as user authentication and role-based access. Alignment with validation principles outlined by the R Validation Hub supports decision-making in regulated settings, though deeper software inspection may be required in some cases. Deployment is possible using tools like Shiny Server or Posit Connect, with installation options including GitHub and local configuration via {renv}.

{riskmetric}

Providing a framework for evaluating the quality of R packages, the {riskmetric} package assesses development practices, documentation, community engagement and sustainability through a series of metrics. Currently operating in a maintenance-only phase, further development is focused on a new tool called {val.metre}. The workflow involves retrieving package information, assessing it against predefined criteria and generating a risk score, with installation available from CRAN or GitHub. An associated application, {riskassessment}, offers a user interface for organisations to review and manage package risk assessments, store metrics and apply organisational rules.

{rlistings}

Designed to create and display formatted listings with a focus on ASCII rendering for tables and regulatory-ready outputs, the {rlistings} R package relies on the {formatters} package for formatting infrastructure. Requirements such as flexible pagination, multiple output formats and repeated key columns informed its development. Available on CRAN and GitHub, the package is under active development and includes features such as adjustable column widths, alignment and support for titles and footnotes.

{rtables}

Tailored for generating submission-ready tables for health authority review, the {rtables} R package creates and displays complex tables with advanced formatting and output options that support regulatory requirements for clinical trial data presentation. Separation of data values from their visualisation is enabled, multiple values can be included within cells, and flexible tabulation and formatting capabilities are provided, including cell spans, rounding and alignment. Output formats include HTML, ASCII, LaTeX, PDF and PowerPoint, with additional formats under development. Also, the package incorporates features such as pagination, distinction between data names and labels for CDISC standards and support for titles and footnotes. Installation is available via CRAN or GitHub, with ongoing community support and training resources.

{rtflite}

A lightweight Python library focused on precise formatting of production-quality tables and figures, {rtflite} is designed for composing RTF documents. Installation is available via PyPI or directly from its GitHub repository, with optional dependencies available to enable DOCX assembly support and RTF-to-PDF or RTF-to-DOCX conversion via LibreOffice.

{sdtm.oak}

Offering a modular, open-source solution for generating CDISC SDTM datasets, the {sdtm.oak} R package is designed to work across different electronic data capture systems and data standards. Industry challenges related to inconsistent raw data structures and varying data collection practices are addressed through reusable algorithms that map raw datasets to SDTM domains, with current capabilities covering Findings, Events and Intervention classes. Future developments aim to expand domain support, introduce metadata-driven code generation and enhance automation potential, though sponsor-specific metadata management tasks are not yet handled by the package. Available on CRAN and GitHub, development is ongoing with refinements based on user feedback and evolving SDTM requirements.

{sdtmchecks}

Providing functions to detect common data issues in SDTM datasets, the {sdtmchecks} package is designed to be broadly applicable and useful for analysis. Installation is available from CRAN or via GitHub, with development versions accessible through specific repositories, and users are not required to specify SDTM versions. A range of data check functions stored as R scripts is included, and contributions are encouraged that maintain flexibility across different data standards.

{siera}

Facilitating the generation of Analysis Results Datasets (ARD's) by processing Analysis Results Standard (ARS) metadata, the {siera} package works with parameters such as analysis sets, groupings, data subsets and methods. Metadata is typically provided in JSON format and used to create R scripts automatically that, when executed with corresponding ADaM datasets, produce ARD's in a structured format. The package can be installed from CRAN or GitHub, and its primary function, `readARS`, requires an ARS file, an output directory and access to relevant ADaM data. The CDISC Analysis Results Standard underpins this process, promoting automation and consistency in analysis outcomes.

{teal}

An open-source, Shiny-based interactive framework for exploratory data analysis, {teal} is developed as part of the pharmaverse ecosystem and maintained by F. Hoffmann-La Roche AG alongside a broad community of contributors. Analytical applications are built by combining supported data types, including CDISC clinical trial data, independent or relational datasets and `MultiAssayExperiment` objects, with modular analytical components known as teal modules. These modules can be drawn from dedicated packages covering general data exploration, clinical reporting and multi-omics analysis and define the specific analyses presented within an application. A suite of companion packages handles logging, reproducibility, data loading, filtering, reporting and transformation. The package is available on CRAN and is under active development, with community support provided through the {pharmaverse} Slack workspace.

{tern}

Supporting clinical trial reporting through a broad range of analysis functions, the {tern} R package offers data visualisation capabilities including line plots, Kaplan-Meier plots, forest plots, waterfall plots and Bland-Altman plots. Statistical model fit summaries for logistic and Cox regression are also provided, along with numerous analysis and summary table functions. Many of these outputs can be integrated into interactive Teal Shiny applications via the {teal.modules.clinical} package.

{tfrmt}

Offering a structured approach to defining and applying formatting rules for data displays in clinical trials, the {tfrmt} package streamlines the creation of mock displays, aligns with industry-standard Analysis Results Data (ARD) formats and integrates formatting tasks into the programming workflow to reduce manual effort and rework. Metadata is leveraged to automate styling and layout, enabling standardised formatting with minimal code, supporting quality control before final output and facilitating the reuse of datasets across different table types. Built on the {gt} package, the tool provides a flexible interface for generating tables and mock-ups, allowing users to focus on data interpretation rather than repetitive formatting tasks.

{tfrmtbuilder}

A tool for defining display-related metadata to streamline the creation and modification of table formats, the {tfrmtbuilder} package supports workflows such as generating tables from scratch, using templates or editing existing ones. Features include a toggle to switch between mock and real data, options to load or create datasets, tools for mapping and formatting data and the ability to export results as JSON, HTML or PNG. Designed for use in study planning and analysis phases, the package allows users to manage table structures efficiently.

{tidyCDISC}

An open-source R Shiny application, {tidyCDISC} is designed to help clinical personnel explore and analyse ADaM-standard data sets without writing any code. Customised clinical tables can be generated through a point-and-click interface, trends across patient populations examined using dynamic figures and individual patient profiles explored in detail. A broad range of users is served, from clinical heads with no programming background to statisticians and statistical programmers, with reported time savings of around 95% for routine trial analysis tasks. The app accepts only `sas7bdat` files conforming to CDISC ADaM standards and includes a feature to export reproducible R scripts from its table generator. A demo version is available without installation using CDISC pilot data, whilst uploading study data requires installing the package from CRAN or via GitHub.

{tidytlg}

Facilitating the creation of tables, listings and graphs using the {tidyverse} framework, the {tidytlg} package offers two approaches: a functional method involving custom scripts for each output and a metadata-driven method that leverages column and table metadata to generate results automatically. Tools for data analysis, including frequency tables and univariate statistics, are included alongside support for exporting outputs to formatted documents.

{Tplyr}

Simplifying the creation of clinical data summaries by breaking down complex tables into reusable layers, {Tplyr} allows users to focus on presentation rather than repetitive data processing. The conceptual approach of {dplyr} is mirrored but applied to common clinical table types, such as counting event-based variables, generating descriptive statistics for continuous data and categorising numerical ranges. Metadata is included with each summary produced to ensure traceability from raw data to final output, and user-acceptance testing documentation is provided to support its use in regulated environments. Installation options are available via CRAN or GitHub, accompanied by detailed vignettes covering features like layer templates, metadata extension and styled table outputs.

{valtools}

Streamlining the validation of R packages used in clinical research and drug development, {valtools} offers templates and functions to support tasks such as setting up validation frameworks, managing requirements and test cases and generating reports. Developed by the R Package Validation Framework PHUSE Working Group, the package integrates with standard development tools and provides functions prefixed with `vt` to facilitate structured validation processes including infrastructure setup, documentation creation and automated checks. Generating validation reports, scraping metadata from validation configurations and executing validation workflows through temporary installations or existing packages are all supported.

{whirl}

Facilitating the execution of scripts in batch mode whilst generating detailed logs that meet regulatory requirements, the {whirl} package produces logs including script status, execution timestamps, environment details, package versions and environmental variables, presented in a structured HTML format. Individual or multiple scripts can be run simultaneously, with parallel processing enabled through specified worker counts. A configuration file allows scripts to be executed in sequential steps, ensuring dependencies are respected, and the package produces individual logs for each script alongside a summary log and a tibble summarising execution outcomes. Installation options include CRAN and GitHub, with documentation available for customisation and advanced usage.

{xportr}

Assisting clinical programmers in preparing CDISC compliant XPT files for clinical data sets, the {xportr} package associates metadata with R data frames, performs validation checks and converts data into transportable SAS v5 XPT format. Tools are included to define variable types, set appropriate lengths, apply labels, format data, reorder variables and assign dataset labels, ensuring adherence to standards such as variable naming conventions, character length limits and the absence of non-ASCII characters. A practical example demonstrates how to use a specification file to apply these transformations to an ADSL dataset, ultimately generating a compliant XPT file.

Some R packages to explore as you find your feet with the language

24^th March 2026

Here are some commonly used R packages and other tools that are pervasive, along with others that I have encountered while getting started with the language, itself becoming pervasive in my line of business. The collection grew organically as my explorations proceeded, and reflects what I was trying out during my acclimatisation.

General

Here are two general packages to get things started, with one of them being unavoidable in the R world. The other is more advanced, possibly offering more to package developers.

{tidyverse}

You cannot use R without knowing about this collection of packages. In many ways, they form a mini-language of their own, drawing some criticism from those who reckon that base R functionality covers a sufficient gamut anyway. Nevertheless, there is so much here that will get you going with data wrangling and visualisation that it is worth knowing what is possible. The complaints may come from your not needing to use anything else for these purposes.

{plumber}

This R package enables developers to convert existing R functions into web API endpoints by adding roxygen2-like comment annotations to their code. Once annotated, functions can handle HTTP GET and POST requests, accept query string or JSON parameters and return outputs such as plain values or rendered plots. The package is available on CRAN as a stable release, with a development version hosted on GitHub. For deployment, it integrates with DigitalOcean through a companion package called {plumberDeploy}, and also supports Posit Connect, PM2 and Docker as hosting options. Related projects in the same space include OpenCPU, which is designed for hosting R APIs in scientific research contexts, and the now-discontinued jug package, which took a more programmatic approach to API construction.

Data Preparation

You simply cannot avoid working with data during any analysis or reporting work. While there is a learning curve if you are used to other languages, there is little doubt that R is well-endowed when it comes to performing these tasks. Here are some packages that extend base R capabilities and might even add some extra user-friendliness along the way.

{forcats}

The {forcats} package in R provides functions to manage categorical variables by reordering factor levels, collapsing infrequent values and adjusting their sequence based on frequency or other variables. It includes tools such as reordering by another variable, grouping rare categories into 'other' and modifying level order manually, which are useful for data analysis and visualisation workflows. Designed as part of the tidyverse, it integrates with other packages to streamline tasks like counting and plotting categorical data, enhancing clarity and efficiency in handling factors within R.

{tidyr}

Around this time last year, I remember completing a LinkedIn course on a set of good practices known as tidy data, where each variable occupies a column, each observation a row and each value a single cell. This package is designed to help users restructure data so it follows those rules. It provides tools for reshaping data between long and wide formats, handling nested lists, splitting or combining columns, managing missing values and layering or flattening grouped data.

Installation options include the {tidyverse} collection, standalone installation, or the development version from GitHub. The package succeeds earlier reshaping tools like {reshape2} and {reshape}, offering a focused approach to tidying data rather than general reshaping or aggregation.

{haven}

Having a long track record of working with SAS, {haven} with its abilities to read and write data files from statistical software such as SAS, SPSS and Stata, leveraging the ReadStat library, arouses my interest. Handily, it supports a range of file formats, including SAS transport and data files, SPSS system and older portable files and Stata data files up to version 15, converting these into tibbles with enhanced printing capabilities. Value labels are preserved as a labelled class, allowing conversion to factors, while dates and times are transformed into standard R classes.

{RMariaDB}

While there are other approaches to working with databases using R, {RMariaDB} provides a database interface and driver for MariaDB, designed to fully comply with the DBI specification and serve as a replacement for the older {RMySQL} package. It supports connecting to databases using configuration files, executing queries, reading and writing data tables and managing results in chunks. Installation options include binary packages from CRAN or development versions from GitHub, with additional dependencies such as MariaDB Connector/C or libmysqlclient required for Linux and macOS systems. Configuration is typically handled through a MariaDB-specific file, and the package includes acknowledgments for contributions from various developers and organisations.

COVID-19 Data Hub

For many people, the pandemic may be a fading memory, yet it offered its chances for learning R, not least because there was a use case with more than a hint of personal interest about it. Here is a library making it easier to get hold of the data, with some added pre-processing too. Memories of how I needed to wrangle what was published by various sources make me appreciate just how vital it is to have harmonised data for analysis work.

Table Production

While many appear to graphical presentation of results to their tabular display, R does have its options here too. In recent times, the options have improved, particularly of the pharmaverse initiative. Here is a selection of what I found during my explorations.

{officer}

Part of the {officeverse} along with {officedown}, {Flextable}, {Rvg} and {mschart}, the {officer} R package enables users to create and modify Word and PowerPoint documents directly from R, allowing the insertion of images, tables and formatted content, as well as the import of document content into data frames. It supports the generation of RTF files and integrates with other packages for advanced features such as vector graphics and native office charts. Installation options include CRAN and GitHub, with community resources available for assistance and contributions. The package facilitates the manipulation of document elements like paragraphs, tables and section breaks and provides tools for exporting and importing content between R and office formats, alongside functions for managing slide layouts and embedded objects in presentations.

{pharmaRTF}

If you work in clinical research like I do, the need to produce data tabulations is a non-negotiable requirement. That is how this package came to be developed and the pharmaverse of which it is part has numerous other options, should you need to look at using one of those. The flavour of RTF produced here is the Microsoft Word variety, which did not look as well in LibreOffice Writer when I last looked at the results with that open-source alternative. Otherwise, the results look well to many eyes.

{formattable}

Here is a package that enhances data presentation by applying customisable formatting to vectors and data frames, supporting formats such as percentages, currency and accounting. Available on GitHub and CRAN, it integrates with dynamic document tools like {knitr} and {rmarkdown} to produce visually distinct tables, with features including gradient colour scales, conditional styling and icon-based representations. It automatically converts to {htmlwidgets} in interactive environments and is licensed under MIT, enabling flexible use in both static and interactive data displays.

{reactable}

The {reactable} package for R provides interactive data tables built on the React Table library, offering features such as sorting, filtering, pagination, grouping with aggregation, virtual scrolling for large datasets and support for custom rendering through R or JavaScript. It integrates seamlessly into R Markdown documents and Shiny applications, enabling the use of HTML widgets and conditional styling. Installation options include CRAN and GitHub, with examples demonstrating its application across various datasets and scenarios. The package supports major web browsers and is licensed under MIT, designed for developers seeking dynamic data presentation tools within the R ecosystem.

{DT}

Particularly useful in dynamic web applications like Shiny, the {DT} package in R provides a means of rendering interactive HTML tables by building on the DataTables JavaScript library. It supports features including sorting, searching, pagination and advanced filtering, with numeric, date and time columns using range-based sliders whilst factor and character columns rely on search boxes or dropdowns. Filtering operates on the client side by default, though server-side processing is also available. JavaScript callbacks can be injected after initialisation to manipulate table behaviour, such as enabling automatic page navigation or adding child rows to display additional detail. HTML content is escaped by default as a safeguard against cross-site scripting attacks, with the option to adjust this on a per-column basis. Whilst the package integrates with Shiny applications, attention is needed around scrolling and slider positioning to prevent layout problems. Overall, the package is well suited to exploratory data analysis and the building of interactive dashboards.

{gt}

The {gt} package in R enables users to create well-structured tables with a variety of formatting options, starting from data frames or tibbles and incorporating elements such as headers, footers and customised column labels. It supports output in HTML, LaTeX and RTF formats and includes example datasets for experimentation. The package prioritises simplicity for common tasks while offering advanced functions for detailed customisation, with installation available via CRAN or GitHub. Users can access resources like documentation, community forums and example projects to explore its capabilities, and it is supported by a range of related packages that extend its functionality.

{gtsummary}

Enabling users to produce publication-ready outputs with minimal code, the {gtsummary} package offers a streamlined approach to generating analytical and summary tables in R. It automates the summarisation of data frames, regression models and other datasets, identifying variable types and calculating relevant statistics, including measures of data incompleteness. Customisation options allow for formatting, merging and styling tables to suit specific needs, while integration with packages such as {broom} and {gt} facilitates seamless incorporation into R Markdown workflows. The package supports the creation of side-by-side regression tables and provides tools for exporting results as images, HTML, Word, or LaTeX files, enhancing flexibility for reporting and sharing findings.

{huxtable}

Here is an R package designed to generate LaTeX and HTML tables with a modern, user-friendly interface, offering extensive control over styling, formatting, alignment and layout. It supports features such as custom borders, padding, background colours and cell spanning across rows or columns, with tables modifiable using standard R subsetting or dplyr functions. Examples demonstrate its use for creating simple tables, applying conditional formatting and producing regression output with statistical details. The package also facilitates quick export to formats like PDF, DOCX, HTML and XLSX. Installation options include CRAN, R-Universe and GitHub, while the name reflects its origins as an enhanced version of the {xtable} package. The logo was generated using the package itself, and the background design draws inspiration from Piet Mondrian’s artwork.

Figure Generation

R has such a reputation for graphical presentations that it is cited as a strong reason to explore what the ecosystem has to offer. While base R itself is not shabby when it comes to creating graphs and charts, these packages will extend things by quite a way. In fact, the first on this list is near enough pervasive.

{ggplot2}

Though its default formatting does not appeal to me, the myriad of options makes this a very flexible tool, albeit at the expense of some code verbosity. Multi-panel plots are not among its strengths, which may send you elsewhere for that need.

{ggforce}

Focusing on features not included in the core library, the {ggforce} package extends {ggplot2} by offering additional tools to enhance data visualisation. Designed to complement the primary role of {ggplot2} in exploratory data analysis, it provides a range of geoms, stats and other components that are well-documented and implemented, aiming to support more complex and custom plot compositions. Available for installation via CRAN or GitHub, the package includes a variety of functionalities described in detail on its associated website, though specific examples are not included here.

{cowplot}

Developed by Claus O. Wilke for internal use in his lab, {cowplot} is an R package designed to help with the creation of publication-quality figures built on top of {ggplot2}. It provides a set of themes, tools for aligning and arranging plots into compound figures and functions for annotating plots or combining them with images. The package can be installed directly from CRAN or as a development version via GitHub, and it has seen widespread use in the book Fundamentals of Data Visualisation.

{sjPlot}

The {sjPlot} package provides a range of tools for visualising data and statistical results commonly used in social science research, including frequency tables, histograms, box plots, regression models, mixed effects models, PCA, correlation matrices and cluster analyses. It supports installation via CRAN for stable releases or through GitHub for development versions, with documentation and examples available online. The package is licensed under GPL-3 and developed by Daniel Lüdecke, offering functions to create visualisations such as scatter plots, Likert scales and interaction effect plots, along with tools for constructing index variables and presenting statistical outputs in tabular formats.

{thematic}

By offering a centralised approach to theming and enabling automatic adaptation of plot styles within Shiny applications, the {thematic} package simplifies the styling of R graphics, including {ggplot2}, {lattice} and base R plots, R Markdown documents and RStudio. It allows users to apply consistent visual themes across different plotting systems, with auto-theming in Shiny and R Markdown relying on CSS and {bslib} themes, respectively. Installation requires specific versions of dependent packages such as {shiny} and {rmarkdown}, while custom fonts benefit from {showtext} or {ragg}. Users can set global defaults for background, foreground and accent colours, as well as fonts, which can be overridden with plot-specific theme adjustments. The package also defines default colour scales for qualitative and sequential data and integrates with tools like bslib to import Google Fonts, enhancing visual consistency across different environments and user interfaces.

Publishing Tools

The R ecosystem goes beyond mere graphical and tabular display production to offer means for taking things much further, often offering platforms for publishing your work. These can be used locally too, so there is no need to entrust everything to a third-party provider. The uses are endless for what is available, and it appears that Posit has used this to help with building documentation and training too.

R Markdown

What you have here is one of those distinguishing facilities of the R ecosystem, particularly for those wanting to share their analysis work with more than a hint of reproducibility. The tool combines narrative text and code to generate various outputs, supporting multiple programming languages and formats such as HTML, PDF and dashboards. It enables users to produce reports, presentations and interactive applications, with options for publishing and scheduling through platforms like RStudio Connect, facilitating collaboration and distribution of results in professional settings.

Distill for R Markdown

Distill for R Markdown is a tool designed to streamline the creation of technical documents, offering features such as code folding, syntax highlighting and theming. It builds on existing frameworks like Pandoc, MathJax and D3, enabling the production of dynamic, interactive content. Users can customise the appearance with CSS and incorporate appendices for supplementary information. The tool acknowledges the contributions of developers who created foundational libraries, ensuring accessibility and functionality for a wide audience. Its design prioritises clarity, allowing authors to focus on presenting results rather than underlying code, while maintaining flexibility for those who wish to include detailed explanations.

{shiny}

For a while, this was one of R's unique selling points, and remains as compelling a reason to use the language even when Python has got its own version of the package. Enabling the creation of interactive web applications for data analysis without requiring web development expertise allows users to build interfaces that let others explore data through dynamic visualisations and filters. Here is a simple example: an app that generates scatter plots with adjustable variables, species filters and marginal plots, hosted either on personal servers or through a dedicated hosting service.

{bslib}

The {bslib} R package offers a modern user interface toolkit for Shiny and R Markdown applications, leveraging Bootstrap to enable the creation of customisable dashboards and interactive theming. It supports the use of updated Bootstrap and Bootswatch versions while maintaining compatibility with existing defaults, and provides tools for real-time visual adjustments. Installation is available through CRAN, with example previews demonstrating its capabilities.

{rhandsontable}

Enabling users to manipulate and validate data within a spreadsheet-like interface, the {rhandsontable} package introduces an interactive data grid for R. It supports features such as custom cell rendering, validation rules and integration with Shiny applications. When used in Shiny, the widget requires explicit conversion of data using the hot_to_r function, as updates may not be immediately reflected in reactive contexts. Examples demonstrate its application in various scenarios, including date editing, financial calculations and dynamic visualisations linked to charts. The package also accommodates bookmarks in Shiny apps with specific handling. Users are encouraged to report issues or contribute improvements, with guidance provided for those seeking to expand its functionality. The development team welcomes feedback to refine the tool further, ensuring it aligns with evolving user needs.

{xaringanExtra}

{xaringanExtra} offers a range of enhancements and extensions for creating and presenting slides with xaringan, enabling features such as adding an overview tile view, making slides editable, broadcasting in real time, incorporating animations, embedding live video feeds and applying custom styles. It allows users to selectively activate individual tools or load multiple features simultaneously through a single function call, supporting tasks like adding banners, enabling code copying, fitting slides to screen dimensions and integrating utility toolkits. The package is available for installation via CRAN or GitHub, providing flexibility for developers and presenters seeking to expand the functionality of their slides.

Online R programming books that are worth bookmarking

23^rd March 2026

As part of making content more useful following its reorganisation, numerous articles on the R statistical computing language have appeared on here. All of those have taken a more narrative form. With this collation of online books on the R language, I take a different approach. What you find below is a collection of links with associated descriptions. While narrative accounts can be very useful, there is something handy about running one's eye down a compilation as well. Many entries have a corresponding print edition, some of which are not cheap to buy, which makes me wonder about the economics of posting the content online as well, though it can help with getting feedback during book preparation.

Big Book of R

We start with this comprehensive collection of over 400 free and affordable resources related to the R programming language, organised into categories such as data science, statistics, machine learning and specific fields like economics and life sciences. In many ways, it is a superset of what you find below and complements this collection with many other finds. The fact that it is a living collection makes it even more useful.

R Programming for Data Science

Here is an introduction to the R programming language, focusing on its application in data science. It covers foundational topics such as installation, data manipulation, function writing, debugging and code optimisation, alongside advanced concepts like parallel computation and data analysis case studies. The text includes practical guidance on handling data structures, using packages such as {dplyr} and {readr} as well as working with dates, times and regular expressions. Additional sections address control structures, scoping rules and profiling techniques, while the author also discusses resources for staying updated through a podcast and accessing e-book versions for ongoing revisions.

Hands-On Programming with R

Designed for individuals with no prior coding experience, the book provides an introduction to programming in R while using practical examples to teach fundamental concepts such as data manipulation, function creation and the use of R's environment system. It is structured around hands-on projects, including simulations of weighted dice, playing cards and a slot machine, alongside explanations of core programming principles like objects, notation, loops and performance optimisation. Additional sections cover installation, package management, data handling and debugging techniques. While the book is written using RMarkdown and published under a Creative Commons licence, a physical edition is available through O’Reilly.

Advanced R

What you have here is one of several books written by Hadley Wickham. This one is published in its second edition as part of Chapman and Hall's R Series and is aimed primarily at R users who want to deepen their programming skills and understanding of the language, though it is also useful for programmers migrating from other languages. The book covers a broad range of topics organised into sections on foundations, functional programming, object-oriented programming, metaprogramming and techniques, with the latter including debugging, performance measurement and rewriting R code in C++.

Cookbook for R

Unlike Paul Teetor's separately published R Cookbook, the Cookbook for R was created by Winston Chang. It offers solutions to common tasks and problems in data analysis, covering topics such as basic operations, numbers, strings, formulas, data input and output, data manipulation, statistical analysis, graphs, scripts and functions, and tools for experiments.

R for Data Science

The second edition of R for Data Science by Hadley Wickham, Mine Çetinkaya-Rundel and Garrett Grolemund offers a structured approach to learning data science with R, covering essential skills such as data visualisation, transformation, import, programming and communication. Organised into chapters that explore workflows, data manipulation techniques and tools like Quarto for reproducible research, the book emphasises practical applications and best practices for handling data effectively.

R Graphics Cookbook

The R Graphics Cookbook, 2nd edition, offers a comprehensive guide to creating visualisations in R, structured into chapters that cover foundational skills such as installing and using packages, loading data from various formats and exploring datasets through basic plots. It progresses to detailed techniques for constructing bar graphs, line graphs, scatter plots and histograms, alongside methods for customising axes, annotations, themes and legends.

The book also addresses advanced topics like colour application, faceting data into subplots, generating specialised graphs such as network diagrams and heat maps and preparing data for visualisation through reshaping and summarising. Additional sections focus on refining graphical outputs for presentation, including exporting to different file formats and adjusting visual elements for clarity and aesthetics, while an appendix provides an overview of the {ggplot2} system.

R Markdown: The Definitive Guide

Published by Chapman & Hall/CRC, R Markdown: The Definitive Guide by Yihui Xie, J.J. Allaire and Garrett Grolemund covers the R Markdown document format, which has been in use since 2012 and is built on the knitr and Pandoc tools. The format allows users to embed code within Markdown documents and compile the results into a range of output formats including PDF, HTML and Word. The guide covers a broad scope of practical applications, from creating presentations, dashboards, journal articles and books to building interactive applications and generating blogs, reflecting how the ecosystem has matured since the {rmarkdown} package was first released in 2014.

A key principle running throughout is that Markdown's deliberately limited feature set is a strength rather than a drawback, encouraging authors to focus on content rather than complex typesetting. Despite this simplicity, the format remains highly customisable through tools such as Pandoc templates, LaTeX and CSS. Documents produced in R Markdown are also notably portable, as their straightforward syntax makes conversion between output formats more reliable, and because results are generated dynamically from code rather than entered manually, they are far more reproducible than those produced through conventional copy-and-paste methods.

R Markdown Cookbook

The R Markdown Cookbook is a practical guide designed to help users enhance their ability to create dynamic documents by combining analysis and reporting. It covers essential topics such as installation, document structure, formatting options and output formats like LaTeX, HTML and Word, while also addressing advanced features such as customisations, chunk options and integration with other programming languages. The book provides step-by-step solutions to common tasks, drawing on examples from online resources and community discussions to offer clear, actionable advice for both new and experienced users seeking to improve their workflow and explore the full potential of R Markdown.

RMarkdown for Scientists

This book provides a practical guide to using R Markdown for scientists, developed from a three-hour workshop and designed to evolve as a living resource. It covers essential topics such as setting up R Markdown documents, integrating with RStudio for efficient workflows, exporting outputs to formats like PDF, HTML and Word, managing figures and tables with dynamic references and captions, incorporating mathematical equations, handling bibliographies with citations and style adjustments, troubleshooting common issues and exploring advanced R Markdown extensions.

bookdown: Authoring Books and Technical Documents with R Markdown

Here is a guide to using the {bookdown} package, which extends R Markdown to facilitate the creation of books and technical documents. It covers Markdown syntax, integration of R code, formatting options for HTML, LaTeX and e-book outputs and features such as cross-referencing, custom blocks and theming. The package supports both multipage and single-document outputs, and its applications extend beyond traditional books to include course materials, manuals and other structured content. The work includes practical examples, publishing workflows and details on customisation, alongside information about licensing and the availability of a printed version.

[blogdown]: Creating Websites with R Markdown

Though the authors note that some information may be outdated due to recent updates to Hugo and the {blogdown} package, and they direct readers to additional resources for the latest features and changes, this book still provides a guide to building static websites using R Markdown and the Hugo static site generator, emphasising the advantages of this approach for creating reproducible, portable content. It covers installation, configuration, deployment options such as Netlify and GitHub Pages, migration from platforms like WordPress and advanced topics including custom layouts and version control as well as practical examples, workflow recommendations and discussions on themes, content management and technical aspects of website development.

[pagedown]: Create Paged HTML Documents for Printing from R Markdown

The R package {pagedown} enables users to create paged HTML documents suitable for printing to PDF, using R Markdown combined with a JavaScript library called paged.js, that later of which implements W3C specifications for paged media. While tools like LaTeX and Microsoft Word have traditionally dominated PDF production, pagedown offers an alternative approach through HTML and CSS, supporting a range of document types including resumes, posters, business cards, letters, theses and journal articles.

Documents can be converted to PDF via Google Chrome, Microsoft Edge or Chromium, either manually or through the chrome_print() function, with additional support for server-based, CI/CD pipeline and Docker-based workflows. The package provides customisable CSS stylesheets, a CSS overriding mechanism for adjusting fonts and page properties, and various formatting features such as lists of tables and figures, abbreviations, footnotes, line numbering, page references, cover images, running headers, chapter prefixes and page breaks. Previewing paged documents requires a local or remote web server, and the layout is sensitive to browser zoom levels, with 100% zoom recommended for the most accurate output.

Dynamic Documents with R and knitr

Developed by Yihui Xie and inspired by the earlier {Sweave} package, {knitr} is an R package designed for dynamic report generation that consolidates the functionality of numerous other add-on packages into a single, cohesive tool. It supports multiple input languages, including R, Python and shell scripts, as well as multiple output markup languages such as LaTeX, HTML, Markdown, AsciiDoc and reStructuredText. The package operates on a principle of transparency, giving users full control over how input and output are handled, and runs R code in a manner consistent with how it would behave in a standard R terminal.

Among its notable features are built-in caching, automatic code formatting via the {formatR} package, support for more than 20 graphics devices and flexible options for managing plots within documents. It also allows advanced users to define custom hooks and regular expressions to extend and tailor its behaviour further. The package is affiliated with the Foundation for Open Access Statistics, a nonprofit organisation promoting free software, open access publishing and reproducible research in statistics.

Mastering Shiny

Mastering Shiny is a comprehensive guide to developing web applications using R, focusing on the Shiny framework designed for data scientists. It introduces core concepts such as user interface design, reactive programming and dynamic content generation, while also exploring advanced topics like performance optimisation, security and modular app development. The book covers practical applications across industries, from academic teaching tools to real-time analytics dashboards, and aims to equip readers with the skills to build scalable, maintainable applications. It includes detailed chapters on workflow, layout, visualisation and user interaction, alongside case studies and technical best practices.

Engineering Production-Grade Shiny Apps

This is aimed at developers and team managers who already possess a working knowledge of the Shiny framework for R and wish to advance beyond the basics toward building robust, production-ready applications. Rather than covering introductory Shiny concepts or post-deployment concerns, the book focuses on the intermediate ground between those two stages, addressing project management, workflow, code structure and optimisation.

It introduces the {golem} package as a central framework and guides readers through a five-step workflow covering design, prototyping, building, strengthening and deployment, with additional chapters on optimisation techniques including R code performance, JavaScript integration and CSS. The book is structured to serve both those with project management responsibilities and those focused on technical development, acknowledging that in many small teams these roles are carried out by the same individual.

Outstanding User Interfaces with Shiny

Written by David Granjon and published in 2022, Outstanding User Interfaces with Shiny is a book aimed at filling the gap between beginner and advanced Shiny developers, covering how to deeply customise and enhance Shiny applications to the point where they become indistinguishable from classic web applications. The book spans a wide range of topics, including working with HTML and CSS, integrating JavaScript, building Bootstrap dashboard templates, mobile development and the use of React, providing a comprehensive resource that consolidates knowledge and experience previously scattered across the Shiny developer community.

R Packages

Now in its second edition, R Packages by Hadley Wickham and Jennifer Bryan is a freely available online guide that teaches readers how to develop packages in R. A package is the core unit of shareable and reproducible R code, typically comprising reusable functions, documentation explaining how to use them and sample data. The book guides readers through the entire process of package development, covering areas such as package structure, metadata, dependencies, testing, documentation and distribution, including how to release a package to CRAN. The authors encourage a gradual approach, noting that an imperfect first version is perfectly acceptable provided each subsequent version improves on the last.

Mastering Spark with R

Written by Javier Luraschi, Kevin Kuo and Edgar Ruiz, Mastering Spark with R is a comprehensive guide designed to take readers from little or no familiarity with Apache Spark or R through to proficiency in large-scale data science. The book covers a broad range of topics, including data analysis, modelling, pipelines, cluster management, connections, data handling, performance tuning, extensions, distributed computing, streaming and contributing to the Spark ecosystem.

Happy Git and GitHub for the useR

Here is a practical guide written by Jenny Bryan and contributors, aimed primarily at R users involved in data analysis or package development. It covers the installation and configuration of Git alongside GitHub, the development of key workflows for common tasks and the integration of these tools into day-to-day work with R and R Markdown. The guide is structured to take readers from initial setup through to more advanced daily workflows, with particular attention paid to how Git and GitHub serve the needs of data science rather than pure software development.

JavaScript for R

Written by John Coene and intended for release as part of the CRC Press R series, JavaScript for R explore how the R programming language and JavaScript can be used together to enhance data science workflows. Rather than teaching JavaScript as a standalone language, the book demonstrates how a limited working knowledge of it can meaningfully extend what R developers can achieve, particularly through the integration of external JavaScript libraries.

The book covers a broad range of topics, progressing from foundational concepts through to data visualisation using the {htmlwidgets} package, bidirectional communication with Shiny, JavaScript-powered computations via the V8 engine and Node.js and the use of modern JavaScript tools such as Vue, React and webpack alongside R. Practical examples are woven throughout, including the building of interactive visualisations, custom Shiny inputs and outputs, image classification and machine learning operations, with all accompanying code made publicly available on GitHub.

HTTP Testing in R

This guide addresses challenges faced by developers of R packages that interact with web resources, offering strategies to create reliable unit tests despite dependencies on internet connectivity, authentication and external service availability. It explores tools such as {vcr}, {webmockr}, {httptest} and {webfakes}, which enable mocking and recording HTTP requests to ensure consistent testing environments, reduce reliance on live data and improve test reliability. The text also covers advanced topics like handling errors, securing tests and ensuring compatibility with CRAN and Bioconductor, while emphasising best practices for maintaining test robustness and contributor-friendly workflows. Funded by rOpenSci and the R Consortium, the resource aims to support developers in building more resilient and maintainable R packages through structured testing approaches.

The Shiny AWS Book

The Shiny AWS Book is an online resource designed to teach data scientists how to deploy, host and maintain Shiny web applications using cloud infrastructure. Addressing a common gap in data science education, it guides readers through a range of DevOps technologies including AWS, Docker, Git, NGINX and open-source Shiny Server, covering everything from server setup and cost management to networking, security and custom configuration.

{ggplot2}: Elegant Graphics for Data Analysis

The third edition of {ggplot2}: Elegant Graphics for Data Analysis provides an in-depth exploration of the Grammar of Graphics framework, focusing on the theoretical foundations and detailed implementation of the ggplot2 package rather than offering step-by-step instructions for specific visualisations. Written by Hadley Wickham, Danielle Navarro and Thomas Lin Pedersen, the book is presented as an online work-in-progress, with content structured across sections such as layers, scales, coordinate systems and advanced programming topics. It aims to equip readers with the knowledge to customise plots according to their needs, rather than serving as a direct guide for creating predefined graphics.

YaRrr! The Pirate’s Guide to R

Written by Nathaniel D. Phillips, this is a beginner-oriented guide to learning the R programming language from the ground up, covering everything from installation and basic navigation of the RStudio environment through to more advanced topics such as data manipulation, statistical analysis and custom function writing. The guide progresses logically through foundational concepts including scalars, vectors, matrices and dataframes before moving into practical areas such as hypothesis testing, regression, ANOVA and Bayesian statistics. Visualisation is given considerable attention across dedicated chapters on plotting, while later sections address loops, debugging and managing data from a variety of file formats. Each chapter includes practical exercises to reinforce learning, and the book concludes with a solutions section for reference.

Data Visualisation: A Practical Introduction

Data Visualisation: A Practical Introduction is a forthcoming second edition from Princeton University Press, written by Kieran Healy and due for release in March 2026, which teaches readers how to explore, understand and present data using the R programming language and the {ggplot2} library. The book aims to bridge the gap between works that discuss visualisation principles without teaching the underlying tools and those that provide code recipes without explaining the reasoning behind them, instead combining both practical instruction and conceptual grounding.

Revised and updated throughout to reflect developments in R and {ggplot2}, the second edition places greater emphasis on data wrangling, introduces updated and new datasets, and substantially rewrites several chapters, particularly those covering statistical models and map-drawing. Readers are guided through building plots progressively, from basic scatter plots to complex layered graphics, with the expectation that by the end they will be able to reproduce nearly every figure in the book and understand the principles that inform each choice.

The book also addresses the growing role of large language models in coding workflows, arguing that genuine understanding of what one is doing remains essential regardless of the tools available. It is suitable for complete beginners, those with some prior R experience, and instructors looking for a course companion, and requires the installation of R, RStudio and a number of supporting packages before work can begin.

PandasGUI: A simple solution for Pandas DataFrame inspection from within VSCode

2^nd September 2025

One of the things that I miss about Spyder when running Python scripts is the ability to look at DataFrames easily. Recently, I was checking a VAT return only for tmux to truncate how much of the DataFrame I could see in output from the print function. While closing tmux might have been an idea, I sought the DataFrame windowing alternative. That led me to the pandasgui package, which did exactly what I needed, apart from pausing the script execution to show me the data. The installed was done using pip:

pip install pandasgui

Once that competed, I could use the following code construct to accomplish what I wanted:

import pandasgui

pandasgui.show(df)

In my case, there were several lines between the two lines above. Nevertheless, the first line made the pandasgui package available to the script, while the second one displayed the DataFrame in a GUI with scrollbars and cells, among other things. That was close enough to what I wanted to leave me able to complete the task that was needed of me.