R | Technology Tales

TOPIC: R

Automating Positron and RStudio updates on Linux Mint 22

6^th November 2025

Elsewhere, I have written about avoiding manual updates with VSCode and VSCodium. Here, I come to IDE's produced by Posit, formerly RStudio, for data science and analytics uses. The first is a more recent innovation that works with both R and Python code natively, while the second has been around for much longer and focusses on native R code alone, though there are R packages allowing an interface of sorts with Python. Neither are released via a PPA, necessitating either manual downloading or the scripted approach taken here for a Linux system. Each software tool will be discussed in turn.

Positron

Now, we work through a script that automates the upgrade process for Positron. This starts with a shebang line calling the bash executable before moving to a line that adds safety to how the script works using a set statement. Here, the -e switch triggers exiting whenever there is an error, halting the script before it carries on to perform any undesirable actions. That is followed by the -u switch that causes errors when unset variables are called; normally these would be assigned a missing value, which is not desirable in all cases. Lastly, the -o pipefail switch causes a pipeline (cmd1 | cmd2 | cm3) to fail if any command in the pipeline produces an error, which can help debugging because the error is associated with the command that fails to complete.

#!/bin/bash
set -euo pipefail

The next step then is to determine the architecture of the system on which the script is running so that the correct download is selected.

ARCH=$(uname -m)
case "$ARCH" in
  x86_64) POSIT_ARCH="x64" ;;
  aarch64|arm64) POSIT_ARCH="arm64" ;;
  *) echo "Unsupported arch: $ARCH"; exit 1 ;;
esac

Once that completes, we define the address of the web page to be interrogated and the path to the temporary file that is to be downloaded.

RELEASES_URL="https://github.com/posit-dev/positron/releases"
TMPFILE="/tmp/positron-latest.deb"

Now, we scrape the page to find the address of the latest DEB file that has been released.

echo "Finding latest Positron .deb for $POSIT_ARCH..."
DEB_URL=$(curl -fsSL "$RELEASES_URL" \
  | grep -Eo "https://cdn\.posit\.co/[A-Za-z0-9/_\.-]+Positron-[0-9\.~-]+-${POSIT_ARCH}\.deb" \
  | head -n 1)

If that were to fail, we get an error message produced before the script is aborted.

if [ -z "${DEB_URL:-}" ]; then
  echo "Could not find a .deb link for ${POSIT_ARCH} on the releases page"
  exit 1
fi

Should all go well thus far, we download the latest DEB file using curl.

echo "Downloading: $DEB_URL"
curl -fL "$DEB_URL" -o "$TMPFILE"

When the download completes, we try installing the package using apt, much like we do with a repo, apart from specifying an actual file path on our system.

echo "Installing Positron..."
sudo apt install -y "$TMPFILE"

Following that, we delete the installation file and issue a message informing the user of the task's successful completion.

echo "Cleaning up..."
rm -f "$TMPFILE"

echo "Done."

When I do this, I tend to find that the Python REPL console does not open straight away, causing me to shut down Positron and leaving things for a while before starting it again. There may be temporary files that need to be expunged and that needs its own time. Someone else might have a better explanation that I am happy to use if that makes more sense than what I am suggesting. Otherwise, all works well.

RStudio

A lot of the same processing happens during the script updating RStudio, so we will just cover the differences. The set -x statement ensures that every command is printed to the console for the debugging that was needed while this was being developed. Otherwise, much code, including architecture detection, is shared between the two apps.

#!/bin/bash
set -euo pipefail
set -x

# --- Detect architecture ---
ARCH=$(uname -m)
case "$ARCH" in
  x86_64) RSTUDIO_ARCH="amd64" ;;
  aarch64|arm64) RSTUDIO_ARCH="arm64" ;;
  *) echo "Unsupported architecture: $ARCH"; exit 1 ;;
esac

Figuring out the distro version and the web page to scrape was where additional effort was needed, and that is reflected in some of the code that follows. Otherwise, many of the ideas applied with Positron also have a place here.

# --- Detect Ubuntu base ---
DISTRO=$(grep -oP '(?<=UBUNTU_CODENAME=).*' /etc/os-release || true)
[ -z "$DISTRO" ] && DISTRO="noble"

# --- Define paths ---
TMPFILE="/tmp/rstudio-latest.deb"
LOGFILE="/var/log/rstudio_update.log"

echo "Detected Ubuntu base: ${DISTRO}"
echo "Fetching latest version number from Posit..."

# --- Get version from Posit's official RStudio Desktop page ---
VERSION=$(curl -s https://posit.co/download/rstudio-desktop/ \
  | grep -Eo 'rstudio-[0-9]+\.[0-9]+\.[0-9]+-[0-9]+' \
  | head -n 1 \
  | sed -E 's/rstudio-([0-9]+\.[0-9]+\.[0-9]+-[0-9]+)/\1/')

if [ -z "$VERSION" ]; then
  echo "Error: Could not extract the latest RStudio version number from Posit's site."
  exit 1
fi

echo "Latest RStudio version detected: ${VERSION}"

# --- Construct download URL (Jammy build for Noble until Noble builds exist) ---
BASE_DISTRO="jammy"
BASE_URL="https://download1.rstudio.org/electron/${BASE_DISTRO}/${RSTUDIO_ARCH}"
FULL_URL="${BASE_URL}/rstudio-${VERSION}-${RSTUDIO_ARCH}.deb"

echo "Downloading from:"
echo "  ${FULL_URL}"

# --- Validate URL before downloading ---
if ! curl --head --silent --fail "$FULL_URL" >/dev/null; then
  echo "Error: The expected RStudio package was not found at ${FULL_URL}"
  exit 1
fi

# --- Download and install ---
curl -L "$FULL_URL" -o "$TMPFILE"
echo "Installing RStudio..."
sudo apt install -y "$TMPFILE" | tee -a "$LOGFILE"

# --- Clean up ---
rm -f "$TMPFILE"
echo "RStudio update to version ${VERSION} completed successfully." | tee -a "$LOGFILE"

When all ended, RStudio worked without a hitch, leaving me to move on to other things. The next time that I am prompted to upgrade the environment, this is the way I likely will go.

Avoiding Python missing package errors with automatic installation checks

20^th October 2025

Though some may not like having something preceding package import statements in Python scripts, I prefer the added robustness of an extra piece of code checking for package presence and installing anything that is missing in place getting an error. In what follows, I define the list of packages that need to be present for everything to work:

required_packages = ["pandas", "tqdm", "progressbar2", "sqlalchemy", "pymysql"]

Then, I declare the inbuilt modules in advance of looping through the list that was already defined (adding special handling for a case where there has been a name change):

import subprocess
import sys
for package in required_packages:
    try:
        __import__(package if package != "progressbar2" else "progressbar")
        print(f"{package} is already installed.")
    except ImportError:
        print(f"{package} not found. Installing...")
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])

The above code tries importing the package and catches the error to do the required installation. While a stable environment may be a better way around all of this, I find that this way of working adds valuable robustness to a script and automates what you would need to do anyway. Though the use of requirements files and even the Poetry tool for dependency management may be next steps, this approach suffices for my simpler needs, at least when it comes to personal projects.

PandasGUI: A simple solution for Pandas DataFrame inspection from within VSCode

2^nd September 2025

One of the things that I miss about Spyder when running Python scripts is the ability to look at DataFrames easily. Recently, I was checking a VAT return only for tmux to truncate how much of the DataFrame I could see in output from the print function. While closing tmux might have been an idea, I sought the DataFrame windowing alternative. That led me to the pandasgui package, which did exactly what I needed, apart from pausing the script execution to show me the data. The installed was done using pip:

pip install pandasgui

Once that competed, I could use the following code construct to accomplish what I wanted:

import pandasgui

pandasgui.show(df)

In my case, there were several lines between the two lines above. Nevertheless, the first line made the pandasgui package available to the script, while the second one displayed the DataFrame in a GUI with scrollbars and cells, among other things. That was close enough to what I wanted to leave me able to complete the task that was needed of me.

A look at the Julia programming language

19^th November 2022

Several open-source computing languages get mentioned when talking about working with data. Among these are R and Python, but there are others; Julia is another one of these. It took a while before I got to check out Julia because I felt the need to get acquainted with R and Python beforehand. There are others like Lua to investigate too, but that can wait for now.

With the way that R is making an incursion into clinical data reporting analysis following the passage of decades when SAS was predominant, my explorations of Julia are inspired by a certain contrariness on my part. Alongside some small personal projects, there has been some reading in (digital) book form and online. Concerning the latter of these, there are useful tutorials like Introduction to Data Science: Learn Julia Programming, Maths & Data Science from Scratch or Julia Programming: a Hands-on Tutorial. Like what happens with R, there are online versions of published books available free of charge, and they include Julia Data Science and Interactive Visualization and Plotting with Julia. Video learning can help too and Jane Herriman has recorded and shared useful beginner's guides on YouTube that start with the basics before heading onto more advanced subjects like multiple dispatch, broadcasting and metaprogramming.

This piece of learning has been made of simple self-inspired puzzles before moving on to anything more complex. That differs from my dalliance with R and Python, where I ventured into complexity first, not least because of testing them out with public COVID data. Eventually, I got around to doing that with Julia too, though my interest was beginning to wane by then, and Julia's abilities for creating multipage PDF files were such that the PDF Toolkit was needed to help with this. Along the way, I have made use of such packages as CSV.jl, DataFrames.jl, DataFramesMeta, Plots, Gadfly.jl, XLSX.jl and JSON3.jl, among others. After that, there is PrettyTables.jl to try out, and anyone can look at the Beautiful Makie website to see what Makie can do. There are plenty of other packages creating graphs, such as SpatialGraphs.jl, PGFPlotsX and GRUtils.jl. For formatting numbers, options include Format.jl and Humanize.jl.

So far, my primary usage has been with personal financial data together with automated processing and backup of photo files. The photo file processing has taken advantage of the ability to compile Julia scripts for added speed because just-in-time compilation always means there is a lag before the real work begins.

VS Code is my chosen editor for working with Julia scripts, since it has a plugin for the language. That adds the REPL, syntax highlighting, execution and data frame viewing capabilities that once were added to the now defunct Atom editor by its own plugin. While it would be nice to have a keyboard shortcut for script execution, the whole thing works well and is regularly updated.

Naturally, there have been a load of queries as I have gone along and the Julia Documentation has been consulted as well as Julia Discourse and Stack Overflow. The latter pair have become regular landing spots on many a Google search. One example followed a glitch that I encountered after a Julia upgrade when I asked a question about this and was directed to the XLSX.jl Migration Guides where I got the information that I needed to fix my code for it to run properly.

There is more learning to do as I continue to use Julia for various things. Once compiled, it does run fast like it has been promised. The syntax paradigm is akin to R and Python, but there are Julia-specific features too. If you have used the others, the learning curve is lessened but not eliminated completely. This is not an object-oriented language as such, but its functional nature makes it familiar enough for getting going with it. In short, the project has come a long way since it started more than ten years ago. There is much for the scientific programmer, but only time will tell if it usurped its older competitors. For now, I will remain interested in it.

Some books and other forms of documentation on R

11^th September 2021

The thrust of an exhortation from a computing handbook publisher comes to mind here: don't just look things up on Google, read a book so you really understand what you are doing. A form of words like that was used to sell an eBook on GitHub, but the same sentiment applies to R or any other computing language. While using a search engine will get you going or add to existing knowledge, only a book or a training course will help to embed real competence.

In the case of R, there is a myriad of blogs out there that can be consulted, as well as function and package documentation on RDocumentation or rrdr.io. For the former, R-bloggers or R Weekly can make good places to start, while ones like Stats and R, Statistics Globe, STHDA, PSI's VIS-SIG and anything from Posit (including their main blog as well as their AI one) can be worth consulting. Additionally, there is also RStudio Education and the NHS-R Community, which also have a GitHub repository together with a YouTube channel. Many packages have dedicated websites as well, so there is no lack of documentation with all of these, so here is a selection:

Tidyverse

forcats

tidyr

Distill for R Markdown

To come to the real subject of this post, R is unusual in that books that you can buy also have companion websites that contain the same content with the same structure. Whatever funds this approach (and some appear to be supported by RStudio itself by the looks of things), there certainly are many books available freely online in HTML as you will see from the list below, while a few do not have a print counterpart as far as I know:

Big Book of R

R Programming for Data Science

Hands-On Programming with R

Advanced R

Cookbook for R

R Graphics Cookbook

R Markdown: The Definitive Guide

R Markdown Cookbook

RMarkdown for Scientists

bookdown: Authoring Books and Technical Documents with R Markdown

blogdown: Creating Websites with R Markdown

pagedown: Create Paged HTML Documents for Printing from R Markdown

Dynamic Documents with R and knitr

Mastering Shiny

Engineering Production-Grade Shiny Apps

Outstanding User Interfaces with Shiny

R Packages

Mastering Spark with R

Happy Git and GitHub for the useR

JavaScript for R

HTTP Testing in R

Outstanding User Interfaces with Shiny

Engineering Production-Grade Shiny Apps

The Shiny AWS Book

Many of the above have counterparts published by O'Reilly or Chapman & Hall, to name the two publishers that I have found so far. Aside from sharing these with you, there is also the personal motivation of having the collection of links somewhere so I can close tabs in my Firefox session. There are other web articles open in other tabs that I need to retain and share, but these will need to do for now, and I hope that you find them as useful as I do.