Technology Tales

Notes drawn from experiences in consumer and enterprise technology

TOPIC: CATEGORICAL VARIABLE

Some R packages to explore as you find your feet with the language

24th March 2026

Here are some commonly used R packages and other tools that are pervasive, along with others that I have encountered while getting started with the language, itself becoming pervasive in my line of business. The collection grew organically as my explorations proceeded, and reflects what I was trying out during my acclimatisation.

General

Here are two general packages to get things started, with one of them being unavoidable in the R world. The other is more advanced, possibly offering more to package developers.

{tidyverse}

You cannot use R without knowing about this collection of packages. In many ways, they form a mini-language of their own, drawing some criticism from those who reckon that base R functionality covers a sufficient gamut anyway. Nevertheless, there is so much here that will get you going with data wrangling and visualisation that it is worth knowing what is possible. The complaints may come from your not needing to use anything else for these purposes.

{plumber}

This R package enables developers to convert existing R functions into web API endpoints by adding roxygen2-like comment annotations to their code. Once annotated, functions can handle HTTP GET and POST requests, accept query string or JSON parameters and return outputs such as plain values or rendered plots. The package is available on CRAN as a stable release, with a development version hosted on GitHub. For deployment, it integrates with DigitalOcean through a companion package called {plumberDeploy}, and also supports Posit Connect, PM2 and Docker as hosting options. Related projects in the same space include OpenCPU, which is designed for hosting R APIs in scientific research contexts, and the now-discontinued jug package, which took a more programmatic approach to API construction.

Data Preparation

You simply cannot avoid working with data during any analysis or reporting work. While there is a learning curve if you are used to other languages, there is little doubt that R is well-endowed when it comes to performing these tasks. Here are some packages that extend base R capabilities and might even add some extra user-friendliness along the way.

{forcats}

The {forcats} package in R provides functions to manage categorical variables by reordering factor levels, collapsing infrequent values and adjusting their sequence based on frequency or other variables. It includes tools such as reordering by another variable, grouping rare categories into 'other' and modifying level order manually, which are useful for data analysis and visualisation workflows. Designed as part of the tidyverse, it integrates with other packages to streamline tasks like counting and plotting categorical data, enhancing clarity and efficiency in handling factors within R.

{tidyr}

Around this time last year, I remember completing a LinkedIn course on a set of good practices known as tidy data, where each variable occupies a column, each observation a row and each value a single cell. This package is designed to help users restructure data so it follows those rules. It provides tools for reshaping data between long and wide formats, handling nested lists, splitting or combining columns, managing missing values and layering or flattening grouped data.

Installation options include the {tidyverse} collection, standalone installation, or the development version from GitHub. The package succeeds earlier reshaping tools like {reshape2} and {reshape}, offering a focused approach to tidying data rather than general reshaping or aggregation.

{haven}

Having a long track record of working with SAS, {haven} with its abilities to read and write data files from statistical software such as SAS, SPSS and Stata, leveraging the ReadStat library, arouses my interest. Handily, it supports a range of file formats, including SAS transport and data files, SPSS system and older portable files and Stata data files up to version 15, converting these into tibbles with enhanced printing capabilities. Value labels are preserved as a labelled class, allowing conversion to factors, while dates and times are transformed into standard R classes.

{RMariaDB}

While there are other approaches to working with databases using R, {RMariaDB} provides a database interface and driver for MariaDB, designed to fully comply with the DBI specification and serve as a replacement for the older {RMySQL} package. It supports connecting to databases using configuration files, executing queries, reading and writing data tables and managing results in chunks. Installation options include binary packages from CRAN or development versions from GitHub, with additional dependencies such as MariaDB Connector/C or libmysqlclient required for Linux and macOS systems. Configuration is typically handled through a MariaDB-specific file, and the package includes acknowledgments for contributions from various developers and organisations.

COVID-19 Data Hub

For many people, the pandemic may be a fading memory, yet it offered its chances for learning R, not least because there was a use case with more than a hint of personal interest about it. Here is a library making it easier to get hold of the data, with some added pre-processing too. Memories of how I needed to wrangle what was published by various sources make me appreciate just how vital it is to have harmonised data for analysis work.

Table Production

While many appear to graphical presentation of results to their tabular display, R does have its options here too. In recent times, the options have improved, particularly of the pharmaverse initiative. Here is a selection of what I found during my explorations.

{officer}

Part of the {officeverse} along with {officedown}, {Flextable}, {Rvg} and {mschart}, the {officer} R package enables users to create and modify Word and PowerPoint documents directly from R, allowing the insertion of images, tables and formatted content, as well as the import of document content into data frames. It supports the generation of RTF files and integrates with other packages for advanced features such as vector graphics and native office charts. Installation options include CRAN and GitHub, with community resources available for assistance and contributions. The package facilitates the manipulation of document elements like paragraphs, tables and section breaks and provides tools for exporting and importing content between R and office formats, alongside functions for managing slide layouts and embedded objects in presentations.

{pharmaRTF}

If you work in clinical research like I do, the need to produce data tabulations is a non-negotiable requirement. That is how this package came to be developed and the pharmaverse of which it is part has numerous other options, should you need to look at using one of those. The flavour of RTF produced here is the Microsoft Word variety, which did not look as well in LibreOffice Writer when I last looked at the results with that open-source alternative. Otherwise, the results look well to many eyes.

{formattable}

Here is a package that enhances data presentation by applying customisable formatting to vectors and data frames, supporting formats such as percentages, currency and accounting. Available on GitHub and CRAN, it integrates with dynamic document tools like {knitr} and {rmarkdown} to produce visually distinct tables, with features including gradient colour scales, conditional styling and icon-based representations. It automatically converts to {htmlwidgets} in interactive environments and is licensed under MIT, enabling flexible use in both static and interactive data displays.

{reactable}

The {reactable} package for R provides interactive data tables built on the React Table library, offering features such as sorting, filtering, pagination, grouping with aggregation, virtual scrolling for large datasets and support for custom rendering through R or JavaScript. It integrates seamlessly into R Markdown documents and Shiny applications, enabling the use of HTML widgets and conditional styling. Installation options include CRAN and GitHub, with examples demonstrating its application across various datasets and scenarios. The package supports major web browsers and is licensed under MIT, designed for developers seeking dynamic data presentation tools within the R ecosystem.

{DT}

Particularly useful in dynamic web applications like Shiny, the {DT} package in R provides a means of rendering interactive HTML tables by building on the DataTables JavaScript library. It supports features including sorting, searching, pagination and advanced filtering, with numeric, date and time columns using range-based sliders whilst factor and character columns rely on search boxes or dropdowns. Filtering operates on the client side by default, though server-side processing is also available. JavaScript callbacks can be injected after initialisation to manipulate table behaviour, such as enabling automatic page navigation or adding child rows to display additional detail. HTML content is escaped by default as a safeguard against cross-site scripting attacks, with the option to adjust this on a per-column basis. Whilst the package integrates with Shiny applications, attention is needed around scrolling and slider positioning to prevent layout problems. Overall, the package is well suited to exploratory data analysis and the building of interactive dashboards.

{gt}

The {gt} package in R enables users to create well-structured tables with a variety of formatting options, starting from data frames or tibbles and incorporating elements such as headers, footers and customised column labels. It supports output in HTML, LaTeX and RTF formats and includes example datasets for experimentation. The package prioritises simplicity for common tasks while offering advanced functions for detailed customisation, with installation available via CRAN or GitHub. Users can access resources like documentation, community forums and example projects to explore its capabilities, and it is supported by a range of related packages that extend its functionality.

{gtsummary}

Enabling users to produce publication-ready outputs with minimal code, the {gtsummary} package offers a streamlined approach to generating analytical and summary tables in R. It automates the summarisation of data frames, regression models and other datasets, identifying variable types and calculating relevant statistics, including measures of data incompleteness. Customisation options allow for formatting, merging and styling tables to suit specific needs, while integration with packages such as {broom} and {gt} facilitates seamless incorporation into R Markdown workflows. The package supports the creation of side-by-side regression tables and provides tools for exporting results as images, HTML, Word, or LaTeX files, enhancing flexibility for reporting and sharing findings.

{huxtable}

Here is an R package designed to generate LaTeX and HTML tables with a modern, user-friendly interface, offering extensive control over styling, formatting, alignment and layout. It supports features such as custom borders, padding, background colours and cell spanning across rows or columns, with tables modifiable using standard R subsetting or dplyr functions. Examples demonstrate its use for creating simple tables, applying conditional formatting and producing regression output with statistical details. The package also facilitates quick export to formats like PDF, DOCX, HTML and XLSX. Installation options include CRAN, R-Universe and GitHub, while the name reflects its origins as an enhanced version of the {xtable} package. The logo was generated using the package itself, and the background design draws inspiration from Piet Mondrian’s artwork.

Figure Generation

R has such a reputation for graphical presentations that it is cited as a strong reason to explore what the ecosystem has to offer. While base R itself is not shabby when it comes to creating graphs and charts, these packages will extend things by quite a way. In fact, the first on this list is near enough pervasive.

{ggplot2}

Though its default formatting does not appeal to me, the myriad of options makes this a very flexible tool, albeit at the expense of some code verbosity. Multi-panel plots are not among its strengths, which may send you elsewhere for that need.

{ggforce}

Focusing on features not included in the core library, the {ggforce} package extends {ggplot2} by offering additional tools to enhance data visualisation. Designed to complement the primary role of {ggplot2} in exploratory data analysis, it provides a range of geoms, stats and other components that are well-documented and implemented, aiming to support more complex and custom plot compositions. Available for installation via CRAN or GitHub, the package includes a variety of functionalities described in detail on its associated website, though specific examples are not included here.

{cowplot}

Developed by Claus O. Wilke for internal use in his lab, {cowplot} is an R package designed to help with the creation of publication-quality figures built on top of {ggplot2}. It provides a set of themes, tools for aligning and arranging plots into compound figures and functions for annotating plots or combining them with images. The package can be installed directly from CRAN or as a development version via GitHub, and it has seen widespread use in the book Fundamentals of Data Visualisation.

{sjPlot}

The {sjPlot} package provides a range of tools for visualising data and statistical results commonly used in social science research, including frequency tables, histograms, box plots, regression models, mixed effects models, PCA, correlation matrices and cluster analyses. It supports installation via CRAN for stable releases or through GitHub for development versions, with documentation and examples available online. The package is licensed under GPL-3 and developed by Daniel Lüdecke, offering functions to create visualisations such as scatter plots, Likert scales and interaction effect plots, along with tools for constructing index variables and presenting statistical outputs in tabular formats.

{thematic}

By offering a centralised approach to theming and enabling automatic adaptation of plot styles within Shiny applications, the {thematic} package simplifies the styling of R graphics, including {ggplot2}, {lattice} and base R plots, R Markdown documents and RStudio. It allows users to apply consistent visual themes across different plotting systems, with auto-theming in Shiny and R Markdown relying on CSS and {bslib} themes, respectively. Installation requires specific versions of dependent packages such as {shiny} and {rmarkdown}, while custom fonts benefit from {showtext} or {ragg}. Users can set global defaults for background, foreground and accent colours, as well as fonts, which can be overridden with plot-specific theme adjustments. The package also defines default colour scales for qualitative and sequential data and integrates with tools like bslib to import Google Fonts, enhancing visual consistency across different environments and user interfaces.

Publishing Tools

The R ecosystem goes beyond mere graphical and tabular display production to offer means for taking things much further, often offering platforms for publishing your work. These can be used locally too, so there is no need to entrust everything to a third-party provider. The uses are endless for what is available, and it appears that Posit has used this to help with building documentation and training too.

R Markdown

What you have here is one of those distinguishing facilities of the R ecosystem, particularly for those wanting to share their analysis work with more than a hint of reproducibility. The tool combines narrative text and code to generate various outputs, supporting multiple programming languages and formats such as HTML, PDF and dashboards. It enables users to produce reports, presentations and interactive applications, with options for publishing and scheduling through platforms like RStudio Connect, facilitating collaboration and distribution of results in professional settings.

Distill for R Markdown

Distill for R Markdown is a tool designed to streamline the creation of technical documents, offering features such as code folding, syntax highlighting and theming. It builds on existing frameworks like Pandoc, MathJax and D3, enabling the production of dynamic, interactive content. Users can customise the appearance with CSS and incorporate appendices for supplementary information. The tool acknowledges the contributions of developers who created foundational libraries, ensuring accessibility and functionality for a wide audience. Its design prioritises clarity, allowing authors to focus on presenting results rather than underlying code, while maintaining flexibility for those who wish to include detailed explanations.

{shiny}

For a while, this was one of R's unique selling points, and remains as compelling a reason to use the language even when Python has got its own version of the package. Enabling the creation of interactive web applications for data analysis without requiring web development expertise allows users to build interfaces that let others explore data through dynamic visualisations and filters. Here is a simple example: an app that generates scatter plots with adjustable variables, species filters and marginal plots, hosted either on personal servers or through a dedicated hosting service.

{bslib}

The {bslib} R package offers a modern user interface toolkit for Shiny and R Markdown applications, leveraging Bootstrap to enable the creation of customisable dashboards and interactive theming. It supports the use of updated Bootstrap and Bootswatch versions while maintaining compatibility with existing defaults, and provides tools for real-time visual adjustments. Installation is available through CRAN, with example previews demonstrating its capabilities.

{rhandsontable}

Enabling users to manipulate and validate data within a spreadsheet-like interface, the {rhandsontable} package introduces an interactive data grid for R. It supports features such as custom cell rendering, validation rules and integration with Shiny applications. When used in Shiny, the widget requires explicit conversion of data using the hot_to_r function, as updates may not be immediately reflected in reactive contexts. Examples demonstrate its application in various scenarios, including date editing, financial calculations and dynamic visualisations linked to charts. The package also accommodates bookmarks in Shiny apps with specific handling. Users are encouraged to report issues or contribute improvements, with guidance provided for those seeking to expand its functionality. The development team welcomes feedback to refine the tool further, ensuring it aligns with evolving user needs.

{xaringanExtra}

{xaringanExtra} offers a range of enhancements and extensions for creating and presenting slides with xaringan, enabling features such as adding an overview tile view, making slides editable, broadcasting in real time, incorporating animations, embedding live video feeds and applying custom styles. It allows users to selectively activate individual tools or load multiple features simultaneously through a single function call, supporting tasks like adding banners, enabling code copying, fitting slides to screen dimensions and integrating utility toolkits. The package is available for installation via CRAN or GitHub, providing flexibility for developers and presenters seeking to expand the functionality of their slides.

From summary statistics to published reports with R, LaTeX and TinyTeX

19th March 2026

For anyone working across LaTeX, R Markdown and data analysis in R, there comes a point where separate tools begin to converge. Data has to be summarised, those summaries have to be turned into presentable tables and the finished result has to compile into a report that looks appropriate for its audience rather than a console dump. These notes follow that sequence, moving from the practical business of summarising data in R through to tabulation and then on to the publishing infrastructure that makes clean PDF and Word output possible.

Summarising Data with {dplyr}

The starting point for many analyses is a quick exploration of the data at hand. One useful example uses the anorexia dataset from the {MASS} package together with {dplyr}. The dataset contains weight change data for young female anorexia patients, divided into three treatment groups: Cont for the control group, CBT for cognitive behavioural treatment and FT for family treatment.

The basic manipulation starts by loading {MASS} and {dplyr}, then using filter() to create separate subsets for each treatment group. From there, mutate() adds a wtDelta column defined as Postwt - Prewt, giving the weight change for each patient. group_by(Treat) prepares the data for grouped summaries, and arrange(wtDelta) sorts within treatment groups. The notes then show how {dplyr}'s pipe operator, %>%, makes the workflow more readable by chaining these operations. The final summary table uses summarize() to compute the number of observations, the mean weight change and the standard deviation within each treatment group. The reported values are count 29, average weight change 3.006897 and standard deviation 7.308504 for CBT, count 26, average weight change -0.450000 and standard deviation 7.988705 for Cont and count 17, average weight change 7.264706 and standard deviation 7.157421 for FT.

That example is not presented as a complete statistical analysis. Instead, it serves as a quick exploratory route into the data, with the wording remaining appropriately cautious and noting that this is only a glance and not a rigorous analysis.

Choosing an R Package for Descriptive Summaries

The question of how best to summarise data opens up a broader comparison of R packages for descriptive statistics. A useful review sets out a common set of needs: a count of observations, the number and types of fields, transparent handling of missing data and sensible statistics that depend on the data type. Numeric variables call for measures such as mean, median, range and standard deviation, perhaps with percentiles. Categorical variables call for counts of levels and some sense of which categories dominate.

Base R's summary() does some of this reasonably well. It distinguishes categorical from numeric variables and reports distributions or numeric summaries accordingly, while also highlighting missing values. Yet, it does not show an overall record count, lacks standard deviation and is not especially tidy or ready for tools such as kable. Several contributed packages aim to improve on that. Hmisc::describe() gives counts of variables and observations, handles both categorical and numerical data and reports missing values clearly, showing the highest and lowest five values for numeric data instead of a simple range. pastecs::stat.desc() is more focused on numeric variables and provides confidence intervals, standard errors and optional normality tests. psych::describe() includes categorical variables but converts them to numeric codes by default before describing them, which the package documentation itself advises should be interpreted cautiously. psych::describeBy() extends this approach to grouped summaries and can return a matrix form with mat = TRUE.

Among the packages reviewed, {skimr} receives especially strong attention for balancing readability and downstream usefulness. skim() reports record and variable counts clearly, separates variables by type and includes missing data and standard summaries in an accessible layout. It also works with group_by() from {dplyr}, making grouped summaries straightforward to produce. More importantly for analytical workflows, the skim output can be treated as a tidy data frame in which each combination of variable and statistic is represented in long form, meaning the results can be filtered, transformed and plotted with standard tidyverse tools such as {ggplot2}.

{summarytools} is presented as another strong option, though with a distinction between its functions. descr() handles numeric variables and can be converted to a data frame for use with kable, while dfSummary() works across entire data frames and produces an especially polished summary. At the time of the original notes, dfSummary() was considered slow. The package author subsequently traced the issue, as documented in the same review, to an excessive number of histogram breaks being generated for variables with large values, imposing a limit to resolve it. The package also supports output through view(dfSummary(data)), which yields an attractive HTML-style summary.

Grouped Summary Table Packages

Once the data has been summarised, the next step is turning those summaries into formal tables. A detailed comparison covers a number of packages specifically designed for this purpose: {arsenal}, {qwraps2}, {Amisc}, {table1}, {tangram}, {furniture}, {tableone}, {compareGroups} and {Gmisc}. {arsenal} is described as highly functional and flexible, with tableby() able to create grouped tables in only a few lines and then be customised through control objects that specify tests, display statistics, labels and missing value treatment. {qwraps2} offers a lot of flexibility through nested lists of summary specifications, though at the cost of more code. {Amisc} can produce grouped tables and works with pander::pandoc.table(), but is noted as not being on CRAN. {table1} creates attractive tables with minimal code, though its treatment of missing values may not suit every use case. {tangram} produces visually appealing HTML output and allows custom rows such as missing counts to be inserted manually, although only HTML output is supported. {furniture} and {tableone} both support grouped table creation, but {tableone} in particular is notable because it is widely used in biomedical research for baseline characteristics tables.

The {tableone} package deserves separate mention because it is designed to summarise continuous and categorical variables in one table, a common need in medical papers. As the package introduction explains, CreateTableOne() can be used on an entire dataset or on a selected subset of variables, with factorVars specifying variables that are coded numerically but should be treated as categorical. The package can display all levels for categorical variables, report missing values via summary() and switch selected continuous variables to non-normal summaries using medians and interquartile ranges instead of means and standard deviations. For grouped comparisons, it prints p-values by default and can switch to non-parametric tests or Fisher's exact test where needed. Standardised mean differences can also be shown. Output can be captured as a matrix and written to CSV for editing in Excel or Word.

Styling and Exporting Tables

With tables constructed, the focus shifts to how they are presented and exported. As Hao Zhu's conference slides explain, the {kableExtra} package builds on knitr::kable() and provides a grammar-like approach to adding styling layers, importing the pipe %>% symbol from {magrittr} so that formatting functions can be added in the same way that layers are added in {ggplot2}. It supports themes such as kable_paper, kable_classic, kable_minimal and kable_material, as well as options for striping, hover effects, condensed layouts, fixed headers, grouped rows and columns, footnotes, scroll boxes and inline plots.

Table output is often the visible end of an analysis, and a broader review of R table packages covers a range of approaches that go well beyond the default output. In R Markdown, packages such as {gt}, {kableExtra}, {formattable}, {DT}, {reactable}, {reactablefmtr} and {flextable} all offer richer possibilities. Some are aimed mainly at HTML output, others at Word. {DT} in particular supports highly customised interactive tables with searching, filtering and cell styling through more advanced R and HTML code. {flextable} is highlighted as the strongest option when knitting to Word, given that the other packages are primarily designed for HTML.

For users working in Word-heavy settings, older but still practical workflows remain relevant too. One approach is simply to write tables to comma-separated text files and then paste and convert the content into a Word table. Another route is through {arsenal}'s write2 functions, designed as an alternative to SAS ODS. The convenience functions write2word(), write2html() and write2pdf() accept a wide range of objects: tableby, modelsum, freqlist and comparedf from {arsenal} itself, as well as knitr::kable(), xtable::xtable() and pander::pander_return() output. One notable constraint is that {xtable} is incompatible with write2word(). Beyond single tables, the functions accept a list of objects so that multiple tables, headers, paragraphs and even raw HTML or LaTeX can all be combined into a single output document. A yaml() helper adds a YAML header to the output, and a code.chunk() helper embeds executable R code chunks, while the generic write2() function handles formats beyond the three convenience wrappers, such as RTF.

The Publishing Infrastructure: CTAN and Its Mirrors

Producing PDF output from R Markdown depends on a working LaTeX installation, and the backbone of that ecosystem is CTAN, the Comprehensive TeX Archive Network. CTAN is the main archive for TeX and LaTeX packages and is supported by a large collection of mirrors spread around the world. The purpose of this distributed system is straightforward: users are encouraged to fetch files from a site that is close to them in network terms, which reduces load and tends to improve speed.

That global spread is extensive. The CTAN mirror list organises sites alphabetically by continent and then by country, with active sites listed across Africa, Asia, Europe, North America, Oceania and South America. Africa includes mirrors in South Africa and Morocco. Asia has particularly wide coverage, with many mirrors in China as well as sites in Korea, Hong Kong, India, Indonesia, Japan, Singapore, Taiwan, Saudi Arabia and Thailand. Europe is especially rich in mirrors, with hosts in Denmark, Germany, Spain, France, Italy, the Netherlands, Norway, Poland, Portugal, Romania, Switzerland, Finland, Sweden, the United Kingdom, Austria, Greece, Bulgaria and Russia. North America includes Canada, Costa Rica and the United States, while Oceania covers Australia and South America includes Brazil and Chile.

The details matter because different mirrors expose different protocols. While many support HTTPS, some also offer HTTP, FTP or rsync. CTAN provides a mirror multiplexer to make the common case simpler: pointing a browser to https://mirrors.ctan.org/ results in automatic redirection to a mirror in or near the user's country. There is one caveat. The multiplexer always redirects to an HTTPS mirror, so anyone intending to use another protocol needs to select manually from the mirror list. That is why the full listings still include non-HTTPS URLs alongside secure ones.

There is also an operational side to the network that is easy to overlook when things are working well. CTAN monitors mirrors to ensure they are current, and if one falls behind, then mirrors.ctan.org will not redirect users there. Updates to the mirror list can be sent to ctan@ctan.org. The master host of CTAN is ftp.dante.de in Cologne, Germany, with rsync access available at rsync://rsync.dante.ctan.org/CTAN/ and web access on https://ctan.org/. For those who want to contribute infrastructure rather than simply use it, CTAN also invites volunteers to become mirrors.

TinyTeX: A Lightweight LaTeX Distribution

This infrastructure becomes much more tangible when looking at a lightweight TeX distribution such as TinyTeX. TinyTeX is a lightweight, cross-platform, portable and easy-to-maintain LaTeX distribution based on TeX Live. It is small in size but intended to function well in most situations, especially for R users. Its appeal lies in not requiring users to install thousands of packages they will never use, installing them as needed instead. This also means installation can be done without administrator privileges, which removes one of the more familiar barriers around traditional TeX setups. TinyTeX can even be run from a flash drive.

For R users, TinyTeX is closely tied to the {tinytex} R package. The distinction is important: tinytex in lower case refers to the R package, while TinyTeX refers to the LaTeX distribution. Installation is intentionally direct. After installing the R package with install.packages('tinytex'), a user can run tinytex::install_tinytex(). Uninstallation is equally simple with tinytex::uninstall_tinytex(). For the average R Markdown user, that is often enough. Once TinyTeX is in place, PDF compilation usually requires no further manual package management.

There is slightly more to know if the aim is to compile standalone LaTeX documents from R. The {tinytex} package provides wrappers such as pdflatex(), xelatex() and lualatex(). These functions detect required LaTeX packages that are missing and install them automatically by default. In practical terms, that means a small example document can be written to a file and compiled with tinytex::pdflatex('test.tex') without much concern about whether every dependency has already been installed. For R users, this largely removes the old pattern of cryptic missing-package errors followed by manual searching through TeX repositories.

Developers may want more than the basics, and TinyTeX has a path for that as well. A helper such as tinytex:::install_yihui_pkgs() installs a collection of packages needed for building the PDF vignettes of many CRAN packages. That is a specific convenience rather than a universal requirement, but it illustrates the design philosophy behind TinyTeX: keep the initial footprint light and offer ways to add what is commonly needed later.

Using TinyTeX Outside R

For users outside R, TinyTeX still works, but the focus shifts to the command-line utility tlmgr. The documentation is direct in its assumptions: if command-line work is unwelcome, another LaTeX distribution may be a better fit. The central command is tlmgr, and much of TinyTeX maintenance can be expressed through it.

On Linux, installation places TinyTeX in $HOME/.TinyTeX and creates symlinks for executables such as pdflatex under $HOME/bin or $HOME/.local/bin if it exists. The installation script is fetched with wget and piped to sh, after first checking that Perl is correctly installed. On macOS, TinyTeX lives in ~/Library/TinyTeX, and users without write permission to /usr/local/bin may need to change ownership of that directory before installation. Windows users can run a batch file, install-bin-windows.bat, and the default installation directory is %APPDATA%/TinyTeX unless APPDATA contains spaces or non-ASCII characters, in which case %ProgramData% is used instead. PowerShell version 3.0 or higher is required on Windows.

Uninstallation follows the same self-contained logic. On Linux and macOS, tlmgr path remove is followed by deleting the TinyTeX folder. On Windows, tlmgr path remove is followed by removing the installation directory. This simplicity is a deliberate contrast with larger LaTeX distributions, which are considerably more involved to remove cleanly.

Maintenance and Package Management

Maintenance is where TinyTeX's relationship to CTAN and TeX Live becomes especially visible. If a document fails with an error such as File 'times.sty' not found, the fix is to search for the package containing that file with tlmgr search --global --file "/times.sty". In the example given, that identifies the psnfss package, which can then be installed with tlmgr install psnfss. If the package includes executables, tlmgr path add may also be needed. An alternative route is to upload the error log to the yihui/latex-pass GitHub repository, where package searching is carried out remotely.

If the problem is less obvious, a full update cycle is suggested: tlmgr update --self --all, then tlmgr path add and fmtutil-sys --all. R users have wrappers for these tasks too, including tlmgr_search(), tlmgr_install() and tlmgr_update(). Some situations still require a full reinstallation. If TeX Live reports Remote repository newer than local, TinyTeX should be reinstalled manually, which for R users can be done with tinytex::reinstall_tinytex(). Similarly, when a TeX Live release is frozen in preparation for a new one, the advice is simply to wait and then reinstall when the next release is ready.

The motivation behind TinyTeX is laid out with unusual clarity. Traditional LaTeX distributions often present a choice between a small basic installation that soon proves incomplete and a very large full installation containing thousands of packages that will never be used. TinyTeX is framed as a way around those frustrations by building on TeX Live's portability and cross-platform design while stripping away unnecessary size and complexity. The acknowledgements also underline that TinyTeX depends on the work of the TeX Live team.

Connecting the R Workflow to a Finished Report

Taken together, these notes show how closely summarisation, tabulation and publishing are linked. {dplyr} and related tools make it easy to summarise data quickly, while a wide range of R packages then turn those summaries into tables that are not only statistically useful but also presentable. CTAN and its mirrors keep the TeX ecosystem available and current across the world, and TinyTeX builds on that ecosystem to make LaTeX more manageable, especially for R users. What begins with a grouped summary in the console can end with a polished report table in HTML, PDF or Word, and understanding the chain between those stages makes the whole workflow feel considerably less mysterious.

  • The content, images, and materials on this website are protected by copyright law and may not be reproduced, distributed, transmitted, displayed, or published in any form without the prior written permission of the copyright holder. All trademarks, logos, and brand names mentioned on this website are the property of their respective owners. Unauthorised use or duplication of these materials may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties.

  • All comments on this website are moderated and should contribute meaningfully to the discussion. We welcome diverse viewpoints expressed respectfully, but reserve the right to remove any comments containing hate speech, profanity, personal attacks, spam, promotional content or other inappropriate material without notice. Please note that comment moderation may take up to 24 hours, and that repeatedly violating these guidelines may result in being banned from future participation.

  • By submitting a comment, you grant us the right to publish and edit it as needed, whilst retaining your ownership of the content. Your email address will never be published or shared, though it is required for moderation purposes.