TOPIC: APT
A Practical Linux Administration Toolkit: Kernels, Storage, Filesystems, Transfers and Shell Completion
Linux command-line administration has a way of beginning with a deceptively simple question that opens into several possible answers. Whether the task is checking which kernels are installed before an upgrade, mounting an NFS share for backup access, diagnosing low disk space, throttling a long-running sync job or wiring up tab completion, the right answer depends on context: the distribution, the file system type, the transport protocol and whether the need is a one-off action or a persistent configuration. This guide draws those everyday administrative themes into a single continuous reference.
Identifying Your System and Installed Kernels
Reading Distribution Information
A sensible place to begin any administration session is knowing exactly what you are working with. One quick approach is to read the release files directly:
cat /etc/*-release
On systems where bat is available (sometimes installed as batcat), the same files can be read with syntax highlighting using batcat /etc/*-release. Typical output on Ubuntu includes /etc/lsb-release and /etc/os-release, with values such as DISTRIB_ID=Ubuntu, VERSION_ID="20.04" and PRETTY_NAME="Ubuntu 20.04.6 LTS". Three additional commands, cat /etc/os-release, lsb_release -a and hostnamectl, each present the same underlying facts in slightly different formats, while uname -r reports the currently running kernel release in isolation. Adding more flags with uname -mrs extends the output to include the kernel name and machine hardware class, which on an older RHEL system might return something like Linux 2.6.18-8.1.14.el5 x86_64.
Querying Installed Kernels by Package Manager
On Red Hat Enterprise Linux, CentOS, Rocky Linux, AlmaLinux, Oracle Linux and Fedora, installed kernels are managed by the RPM package database and are queried with:
rpm -qa kernel
This may return entries such as kernel-5.14.0-70.30.1.el9_0.x86_64. The same information is also accessible through yum list installed kernel or dnf list installed kernel. On Debian, Ubuntu, Linux Mint and Pop!_OS the package manager differs, so the command changes accordingly:
dpkg --list | grep linux-image
Output may include versioned packages, such as linux-image-2.6.20-15-generic, alongside the metapackage linux-image-generic. Arch Linux users can query with pacman -Q | grep linux, while SUSE Enterprise Linux and openSUSE users can turn to rpm -qa | grep -i kernel or use zypper search -i kernel, which presents results in a structured table. Alpine Linux takes yet another approach with apk info -vvv | grep -E 'Linux' | grep -iE 'lts|virt', which may return entries such as linux-virt-5.15.98-r0 - Linux lts kernel.
Finding Kernels Outside the Package Manager
Package databases do not always tell the whole story, particularly where custom-compiled kernels are involved. A kernel built and installed manually will not appear in any package manager query at all. In that case, /lib/modules/ is a useful place to look, since each installed kernel generally has a corresponding module directory. Running ls -l /lib/modules/ may show entries such as 4.15.0-55-generic, 4.18.0-25-generic and 5.0.0-23-generic. A further check is:
sudo find /boot/ -iname "vmlinuz*"
This may return files such as /boot/vmlinuz-5.4.0-65-generic and /boot/vmlinuz-5.4.0-66-generic, confirming precisely which versions exist on disk.
A Brief History of vmlinuz
That naming convention is worth understanding because it appears on virtually every Linux system. vmlinuz is the compressed, bootable Linux kernel image stored in /boot/. The name traces back through computing history: early Unix kernels were simply called /unix, but when the University of California, Berkeley ported Unix to the VAX architecture in 1979 and added paged virtual memory, the resulting system, 3BSD, was known as VMUNIX (Virtual Memory Unix) and its kernel images were named /vmunix. Linux inherited vmlinuz as a mutation of vmunix, with the trailing z denoting gzip compression (though other algorithms such as xz and lzma are also supported). The counterpart vmlinux refers to the uncompressed, non-bootable kernel file, which is used for debugging and symbol table generation but is not loaded directly at boot. Running ls -l /boot/ will show the full set of boot files present on any given system.
Examining and Investigating Disk Usage
Why ls Is Not the Right Tool for Directory Sizes
Storage management is an area where a familiar command can mislead. Running ls -l on a directory typically shows it occupying 4,096 bytes, which reflects the directory entry metadata rather than the combined size of its contents. For real space consumption, du is the appropriate tool.
sudo du -sh /var
The above command produces a summarised, human-readable total such as 85G /var. The -s flag limits output to a single grand total and -h formats values in K, M or G units. For an individual file, du -sh /var/log/syslog might report 12M /var/log/syslog, while ls -lh /var/log/syslog adds ownership and timestamps to the same figure.
Drilling Down to Find Where Space Has Gone
When a file system is full and the need is to locate exactly where the space has accumulated, du can be made progressively more revealing. The command sudo du -h --max-depth=1 /var lists first-level subdirectories with sizes, potentially showing 77G /var/lib, 5.0G /var/cache and 3.3G /var/log. To surface the biggest consumers quickly, piping to sort and head works well:
sudo du -h /var/ | sort -rh | head -10
Adding the -a flag includes individual files alongside directories in the same output:
sudo du -ah /var/ | sort -rh | head -10
Apparent Size Versus Allocated Disk Space
There is a subtle distinction that sometimes causes confusion. By default, du reports allocated disk usage, which is governed by the file system block size. A single-byte file on a file system with 4 KB blocks still consumes 4 KB of disk. To see the amount of data actually stored rather than allocated, sudo du -sh --apparent-size /var reports the apparent size instead. The df command answers a different question altogether: it shows free and used space per mounted file system, such as /dev/sda1 at 73 per cent usage or /dev/sdb1 mounted on /data with 70 GB free. In practice, du is for locating what consumes space and df is for checking how much remains on each volume.
gdu: A Faster Interactive Alternative
Some administrators prefer a more modern tool for storage investigations, and gdu is a notable option. It is a fast disk usage analyser written in Go with an interactive console interface, designed primarily for SSDs where it can exploit parallel processing to full effect, though it functions on hard drives too with less dramatic speed gains. The binary release can be installed by extracting its .tgz archive:
curl -L https://github.com/dundee/gdu/releases/latest/download/gdu_linux_amd64.tgz | tar xz
chmod +x gdu_linux_amd64
mv gdu_linux_amd64 /usr/bin/gdu
It can also be run directly via Docker without installation:
docker run --rm --init --interactive --tty --privileged
--volume /:/mnt/root ghcr.io/dundee/gdu /mnt/root
In use, gdu scans a directory interactively when run without flags, summarises a target with gdu -ps /some/dir, shows top results with gdu -t 10 / and runs without interaction using gdu -n /. It supports apparent size display, hidden file inclusion, item counts, modification times, exclusions, age filtering and database-backed analysis through SQLite or BadgerDB. The project documentation notes that hard links are counted only once and that analysis data can be exported as JSON for later review.
Unpacking TGZ Archives
A brief note on the tar command is useful here, since it appears throughout Linux administration, including in the gdu installation step above. A .tgz file is simply a GZIP-compressed tar archive, and the standard way to extract one is:
tar zxvf archive.tgz
Modern GNU tar can detect the compression type automatically, so the -z flag is often optional:
tar xvf archive.tgz
To extract into a specific directory rather than the current working directory, the -C option takes a destination path:
tar zxvf archive.tgz -C /path/to/destination/
To inspect the contents of a .tgz file without extracting it, the t (list) flag replaces x (extract):
tar ztvf archive.tgz
The tar command was first introduced in the seventh edition of Unix in January 1979 and its name comes from its original purpose as a Tape ARchiver. Despite that origin, modern tar reads from and writes to files, pipes and remote devices with equal facility.
Mounting NFS Shares and Optical Media
Installing NFS Client Tools
NFS remains common on Linux and Unix-like systems, allowing remote directories to be mounted locally and treated as though they were native file systems. Before a client can mount an NFS export, the client packages must be installed. On Ubuntu and Debian, that means:
sudo apt update
sudo apt install nfs-common
On Fedora and RHEL-based distributions, the equivalent is:
sudo dnf install nfs-utils
Once installed, showmount -e 10.10.0.10 can list available exports from a server, returning output such as /backups 10.10.0.0/24 and /data *.
Mounting an NFS Share Manually
Mounting an NFS share follows the same broad pattern as mounting any other file system. First, create a local mount point:
sudo mkdir -p /var/backups
Then mount the remote export, specifying the file system type explicitly:
sudo mount -t nfs 10.10.0.10:/backups /var/backups
A successful command produces no output. Verification is done with mount | grep nfs or df -h, after which the local directory acts as the root of the remote file system for all practical purposes.
Persisting NFS Mounts Across Reboots
Since a manual mount does not survive a reboot, persistent setups use /etc/fstab. An appropriate entry looks like:
10.10.0.10:/backups /var/backups nfs defaults,nofail,_netdev 0 0
The nofail option prevents a boot failure if the NFS server is unavailable when the machine starts. The _netdev flag marks the mount as network-dependent, ensuring the system defers the operation until the network stack is available. Running sudo mount -a tests the entry without rebooting.
Troubleshooting Common NFS Errors
NFS problems are often predictable. A "Permission denied" error usually means the server export in /etc/exports does not include the client, and reloading exports with sudo exportfs -ar is frequently the remedy. "RPC: Program not registered" indicates the NFS service is not running on the server, in which case sudo systemctl restart nfs-server applies. A "Stale file handle" error generally follows a server reboot or a deleted file and is cleared by unmounting and remounting. Timeouts and "Server not responding" messages call for checking network connectivity, confirming that firewall rules permit access to port 111 (rpcbind, required for NFSv3) and port 2049 (NFS itself), and verifying NFS version compatibility using the vers=3 or vers=4 mount option. NFSv4 requires only port 2049, while NFSv2 and NFSv3 also require port 111. To detach a share, sudo umount /var/backups is the standard route, with fuser -m /var/backups helping identify processes that are blocking the unmounting process.
Mounting Optical Media
CDs and DVDs are less central than they once were, but some systems still need to read them. After inserting a disc, blkid can identify the block device path, which is typically /dev/sr0, and will report the file system type as iso9660. With a mount point created using sudo mkdir /mnt/cdrom, the disc is mounted with:
sudo mount /dev/sr0 /mnt/cdrom
The warning device write-protected, mounted read-only is expected for optical media and can be disregarded. CDs and DVDs use the ISO 9660 file system, a data-exchange standard designed to be readable across operating systems. Once mounted, the disc contents are accessible under /mnt/cdrom, and sudo umount /mnt/cdrom detaches it cleanly when work is complete.
Transferring Files Securely and Efficiently
Copying Files with scp
scp (Secure Copy) transfers files and directories between hosts over SSH, encrypting both data and authentication credentials in transit. Its basic syntax is:
scp [OPTIONS] [[user@]host:]source [[user@]host:]destination
The colon is how scp distinguishes between local and remote paths: a path without a colon is local. A typical upload from a local machine to a remote host looks like:
scp file.txt remote_username@10.10.0.2:/remote/directory
A download from a remote host to the local machine reverses the argument order:
scp remote_username@10.10.0.2:/remote/file.txt /local/directory
Commonly used options include -r for recursive directory copies, -p to preserve metadata such as modification times and permissions, -C for compression, -i for a specific private key, -l to cap bandwidth in Kbit/s and the uppercase -P to specify a non-standard SSH port. It is also possible to copy between two remote hosts directly, routing the transfer through the local machine with the -3 flag.
The Protocol Change in OpenSSH 9.0
There is an important change in modern OpenSSH that administrators should be aware of. From OpenSSH 9.0 onward, the scp command uses the SFTP protocol internally by default rather than the older SCP/RCP protocol, which is now considered outdated. The command behaves identically from the user's perspective, but if an older server requires the legacy protocol, the -O flag forces it. For advanced requirements such as resumable transfers or incremental directory synchronisation, rsync is generally the better fit, particularly for large directory trees.
Throttling rsync to Protect Bandwidth
Even with rsync, raw speed is not always desirable. A backup script consuming all available bandwidth can disrupt other services on the same network link, so --bwlimit is often essential. The basic syntax is:
rsync --bwlimit=KBPS source destination
The value is in units of 1,024 bytes unless an explicit suffix is added. A fractional value is also valid: --bwlimit=1.5m sets a cap of 1.5 MB/s. A local transfer capped at 1,000 KB/s looks like:
rsync --bwlimit=1000 /path/to/source /path/to/dest/
And a remote backup:
rsync --bwlimit=1000 /var/www/html/ backups@server1.example.com:~/mysite.backups/
The man page for rsync explains that --bwlimit works by limiting the size of the blocks rsync writes and then sleeping between writes to achieve the target average. Some volume undulation is therefore normal in practice.
Managing I/O Priority with ionice
Bandwidth is only one dimension of the load a transfer places on a system. Disk I/O scheduling may also need attention, particularly on busy servers running other workloads. The ionice utility adjusts the I/O scheduling class and priority of a process without altering its CPU priority. For instance:
/usr/bin/ionice -c2 -n7 rsync --bwlimit=1000 /path/to/source /path/to/dest/
This runs the rsync process in best-effort I/O class (-c2) at the lowest priority level (-n7), combining transfer rate limiting with reduced I/O priority. The scheduling classes are: 0 (none), 1 (real-time), 2 (best-effort) and 3 (idle), with priority levels 0 to 7 available for the real-time and best-effort classes.
Together, --bwlimitand ionice provide complementary controls over exactly how much resource a routine transfer is permitted to consume at any given time.
Setting Up Bash Tab Completion
On Ubuntu and related distributions, Bash programmable completion is provided by the bash-completion package. If tab completion does not function as expected in a new installation or container environment, the following commands will install the necessary support:
sudo apt update
sudo apt upgrade
sudo apt install bash-completion
The package places a shell script at /etc/profile.d/bash_completion.sh. To ensure it is loaded in shell startup, the following appends the source line to .bashrc:
echo "source /etc/profile.d/bash_completion.sh" >> ~/.bashrc
A conditional form avoids duplicating the line on repeated runs:
grep -wq '^source /etc/profile.d/bash_completion.sh' ~/.bashrc
|| echo 'source /etc/profile.d/bash_completion.sh' >> ~/.bashrc
The script is typically loaded automatically in a fresh login shell, but source /etc/profile.d/bash_completion.sh activates it immediately in the current session. Once active, pressing Tab after partial input such as sudo apt i or cat /etc/re completes commands and paths against what is actually installed. Bash also supports simple custom completions: complete -W 'google.com cyberciti.biz nixcraft.com' host teaches the shell to offer those three domains after typing host and pressing Tab, which illustrates how the feature can be extended to match the patterns of repeated daily work.
Installing Snap on Debian
Snap is a packaging format developed by Canonical that bundles an application together with all of its dependencies into a single self-contained package. Snaps update automatically, roll back gracefully on failure and are distributed through the Snap Store, which carries software from both Canonical and independent publishers. The background service that manages them, snapd, is pre-installed on Ubuntu but requires a manual setup step on Debian.
On Debian 9 (Stretch) and newer, snap can be installed directly from the command line:
sudo apt update
sudo apt install snapd
After installation, logging out and back in again, or restarting the system, is necessary to ensure that snap's paths are updated correctly in the environment. Once that is done, install the snapd snap itself to obtain the latest version of the daemon:
sudo snap install snapd
To verify that the setup is working, the hello-world snap provides a straightforward test:
sudo snap install hello-world
hello-world
A successful run prints Hello World! to the terminal. Note that snap is not available on Debian versions before 9. If a snap installation produces an error such as snap "lxd" assumes unsupported features, the resolution is to ensure the core snap is present and current:
sudo snap install core
sudo snap refresh core
On desktop systems, the Snap Store graphical application can then be installed with sudo snap install snap-store, providing a point-and-click interface for browsing and managing snaps alongside the command-line tools.
Increasing the Root Partition Size on Fedora with LVM
Fedora's default installer has used LVM (Logical Volume Manager) for many years, dividing the available disk into a volume group containing separate logical volumes for root (/), home (/home) and swap. This arrangement makes it straightforward to redistribute space between volumes without repartitioning the physical disk, which is a significant advantage over a fixed partition layout. Note that Fedora 33 and later default to Btrfs without LVM for new installations, so the steps below apply to systems that were installed with LVM, including pre-Fedora 33 installs and any system where LVM was selected manually.
Because the root file system is in active use while the system is running, resizing it safely requires booting from a Fedora Live USB stick rather than the installed system. Once booted from the live environment, open a terminal and begin by checking the volume group:
sudo vgs
Output such as the following shows the volume group name, total size and, crucially, how much free space (VFree) is unallocated:
VG #PV #LV #SN Attr VSize VFree
fedora 1 3 0 wz--n- <237.28g 0
Before proceeding, confirm the exact device mapper paths for the root and home logical volumes by running fdisk -l, since the volume group name varies between installations. Common names include /dev/mapper/fedora-root and /dev/mapper/fedora-home, though some systems use fedora00 or another prefix.
When Free Space Is Already Available
If VFree shows unallocated space in the volume group, the root logical volume can be extended directly and the file system resized in a single command:
lvresize -L +5G --resizefs /dev/mapper/fedora-root
The --resizefs flag instructs lvresize to resize the file system at the same time as the logical volume, removing the need to run resize2fs separately.
When There Is No Free Space
If VFree is zero, space must first be reclaimed from another logical volume before it can be given to root. The most common approach is to shrink the home logical volume, which typically holds the most available headroom. Shrinking a file system involves data moving on disk, so the operation requires the volume to be unmounted, which is why the live environment is essential. To take 10 GB from home:
lvresize -L -10G --resizefs /dev/mapper/fedora-home
Once that completes, the freed space appears as VFree in vgs and can be added to the root volume:
lvresize -L +10G --resizefs /dev/mapper/fedora-root
Both steps use --resizefs so that the file system boundaries are updated alongside the logical volume boundaries. After rebooting back into the installed system, df -h will confirm the new sizes are in effect.
Keeping a Linux System Well Maintained
The commands and configurations covered above form a coherent body of everyday Linux administration practice. Knowing where installed kernels are recorded, how to measure real disk usage rather than directory metadata, how to attach local and network file systems correctly, how to extract archives and move data securely without disrupting shared resources, how to make the shell itself more productive, how to extend a Debian system with snap packages and how to redistribute disk space between LVM volumes on Fedora converts a scattered collection of one-liners into a reliable working toolkit. Each topic interconnects naturally with the others: a kernel query clarifies what system you are managing, disk investigation reveals whether a file system has room for what you plan to transfer, NFS mounting determines where that transfer will land and bandwidth control determines what impact it will have while it runs.
Installing PowerShell on Linux Mint for some cross-platform testing
Given how well shell scripting works on Linux and my familiarity with it, the need to install PowerShell on a Linux system may seem surprising. However, this was part of some testing that I wanted to do on a machine that I controlled before moving the code to a client's system. The first step was to ensure that any prerequisites were in place:
sudo apt update
sudo apt install -y wget apt-transport-https software-properties-common
After that, the next moves were to download and install the required package for instating Microsoft repository details:
wget -q https://packages.microsoft.com/config/ubuntu/24.04/packages-microsoft-prod.deb
sudo dpkg -i packages-microsoft-prod.deb
Then, I could install PowerShell itself:
sudo apt update
sudo apt install -y powershell
When it was in place, issuing the following command started up the extra shell for what I needed to do:
pwsh
During my investigations, I found that my local version of PowerShell was not the same as on the client's system, meaning that any code was not as portable as I might have expected, Nevertheless, it is good to have this for future reference and proves how interoperable Microsoft has needed to become.
Ansible automation for Linux Mint updates with repository failover handling
Recently, I had a Microsoft PPA output disrupt an Ansible playbook mediated upgrade process for my main Linux workstation. Thus, I ended up creating a failover for this situation, and the first step in the playbook was to define the affected repo:
vars:
microsoft_repo_url: "https://packages.microsoft.com/repos/code/dists/stable/InRelease"
The next move was to start defining tasks, with the first testing the repo to pick up any lack of responsiveness and flag that for subsequent operations.
tasks:
- name: Check Microsoft repository availability
uri:
url: "{{ microsoft_repo_url }}"
method: HEAD
return_content: no
timeout: 10
register: microsoft_repo_check
failed_when: false
- name: Set flag to skip Microsoft updates if unreachable
set_fact:
skip_microsoft_repos: "{{ microsoft_repo_check.status is not defined or microsoft_repo_check.status != 200 }}"
In the event of a failure, the next task was to disable the repo to allow other processing to take place. This was accomplished by temporarily renaming the relevant files under /etc/apt/sources.list.d/.
- name: Temporarily disable Microsoft repositories
become: true
shell: |
for file in /etc/apt/sources.list.d/microsoft*.list; do
[ -f "$file" ] && mv "$file" "${file}.disabled"
done
for file in /etc/apt/sources.list.d/vscode*.list; do
[ -f "$file" ] && mv "$file" "${file}.disabled"
done
when: skip_microsoft_repos | default(false)
changed_when: false
With that completed, the rest of the update actions could be performed near enough as usual.
- name: Update APT cache (retry up to 5 times)
apt:
update_cache: yes
register: apt_update_result
retries: 5
delay: 10
until: apt_update_result is succeeded
- name: Perform normal upgrade
apt:
upgrade: yes
register: apt_upgrade_result
retries: 3
delay: 10
until: apt_upgrade_result is succeeded
- name: Perform dist-upgrade with autoremove and autoclean
apt:
upgrade: dist
autoremove: yes
autoclean: yes
register: apt_dist_result
retries: 3
delay: 10
until: apt_dist_result is succeeded
After those, another renaming operation restores the earlier filenames to what they were.
- name: Re-enable Microsoft repositories
become: true
shell: |
for file in /etc/apt/sources.list.d/*.disabled; do
base="$(basename "$file" .disabled)"
if [[ "$base" == microsoft* || "$base" == vscode* || "$base" == edge* ]]; then
mv "$file" "/etc/apt/sources.list.d/$base"
fi
done
when: skip_microsoft_repos | default(false)
changed_when: false
Needless to say, this disabling only happens in the event of there being a system failure. Otherwise, the steps are skipped and everything else is completed as it should be. While there is some cause for extended the repository disabling actions to other third repos as well, that is something that I will leave aside for now. Even this shows just how much can be done using Ansible playbooks and how much automation can be achieved. As it happens, I even get Flatpaks updated in much the same way:
- name: Ensure Flatpak is installed
apt:
name: flatpak
state: present
update_cache: yes
cache_valid_time: 3600
- name: Update Flatpak remotes
command: flatpak update --appstream -y
register: flatpak_appstream
changed_when: "'Now at' in flatpak_appstream.stdout"
failed_when: flatpak_appstream.rc != 0
- name: Update all Flatpak applications
command: flatpak update -y
register: flatpak_result
changed_when: "'Now at' in flatpak_result.stdout"
failed_when: flatpak_result.rc != 0
- name: Install unused Flatpak applications
command: flatpak uninstall --unused
register: flatpak_cleanup
changed_when: "'Nothing' not in flatpak_cleanup.stdout"
failed_when: flatpak_cleanup.rc != 0
- name: Repair Flatpak installations
command: flatpak repair
register: flatpak_repair
changed_when: flatpak_repair.stdout is search('Repaired|Fixing')
failed_when: flatpak_repair.rc != 0
The ability to call system commands as you see in the above sequence is an added bonus, though getting the response detection completely sorted remains an outstanding task. All this has only scratched the surface of what is possible.
Automating Positron and RStudio updates on Linux Mint 22
Elsewhere, I have written about avoiding manual updates with VSCode and VSCodium. Here, I come to IDE's produced by Posit, formerly RStudio, for data science and analytics uses. The first is a more recent innovation that works with both R and Python code natively, while the second has been around for much longer and focusses on native R code alone, though there are R packages allowing an interface of sorts with Python. Neither are released via a PPA, necessitating either manual downloading or the scripted approach taken here for a Linux system. Each software tool will be discussed in turn.
Positron
Now, we work through a script that automates the upgrade process for Positron. This starts with a shebang line calling the bash executable before moving to a line that adds safety to how the script works using a set statement. Here, the -e switch triggers exiting whenever there is an error, halting the script before it carries on to perform any undesirable actions. That is followed by the -u switch that causes errors when unset variables are called; normally these would be assigned a missing value, which is not desirable in all cases. Lastly, the -o pipefail switch causes a pipeline (cmd1 | cmd2 | cm3) to fail if any command in the pipeline produces an error, which can help debugging because the error is associated with the command that fails to complete.
#!/bin/bash
set -euo pipefail
The next step then is to determine the architecture of the system on which the script is running so that the correct download is selected.
ARCH=$(uname -m)
case "$ARCH" in
x86_64) POSIT_ARCH="x64" ;;
aarch64|arm64) POSIT_ARCH="arm64" ;;
*) echo "Unsupported arch: $ARCH"; exit 1 ;;
esac
Once that completes, we define the address of the web page to be interrogated and the path to the temporary file that is to be downloaded.
RELEASES_URL="https://github.com/posit-dev/positron/releases"
TMPFILE="/tmp/positron-latest.deb"
Now, we scrape the page to find the address of the latest DEB file that has been released.
echo "Finding latest Positron .deb for $POSIT_ARCH..."
DEB_URL=$(curl -fsSL "$RELEASES_URL" \
| grep -Eo "https://cdn\.posit\.co/[A-Za-z0-9/_\.-]+Positron-[0-9\.~-]+-${POSIT_ARCH}\.deb" \
| head -n 1)
If that were to fail, we get an error message produced before the script is aborted.
if [ -z "${DEB_URL:-}" ]; then
echo "Could not find a .deb link for ${POSIT_ARCH} on the releases page"
exit 1
fi
Should all go well thus far, we download the latest DEB file using curl.
echo "Downloading: $DEB_URL"
curl -fL "$DEB_URL" -o "$TMPFILE"
When the download completes, we try installing the package using apt, much like we do with a repo, apart from specifying an actual file path on our system.
echo "Installing Positron..."
sudo apt install -y "$TMPFILE"
Following that, we delete the installation file and issue a message informing the user of the task's successful completion.
echo "Cleaning up..."
rm -f "$TMPFILE"
echo "Done."
When I do this, I tend to find that the Python REPL console does not open straight away, causing me to shut down Positron and leaving things for a while before starting it again. There may be temporary files that need to be expunged and that needs its own time. Someone else might have a better explanation that I am happy to use if that makes more sense than what I am suggesting. Otherwise, all works well.
RStudio
A lot of the same processing happens during the script updating RStudio, so we will just cover the differences. The set -x statement ensures that every command is printed to the console for the debugging that was needed while this was being developed. Otherwise, much code, including architecture detection, is shared between the two apps.
#!/bin/bash
set -euo pipefail
set -x
# --- Detect architecture ---
ARCH=$(uname -m)
case "$ARCH" in
x86_64) RSTUDIO_ARCH="amd64" ;;
aarch64|arm64) RSTUDIO_ARCH="arm64" ;;
*) echo "Unsupported architecture: $ARCH"; exit 1 ;;
esac
Figuring out the distro version and the web page to scrape was where additional effort was needed, and that is reflected in some of the code that follows. Otherwise, many of the ideas applied with Positron also have a place here.
# --- Detect Ubuntu base ---
DISTRO=$(grep -oP '(?<=UBUNTU_CODENAME=).*' /etc/os-release || true)
[ -z "$DISTRO" ] && DISTRO="noble"
# --- Define paths ---
TMPFILE="/tmp/rstudio-latest.deb"
LOGFILE="/var/log/rstudio_update.log"
echo "Detected Ubuntu base: ${DISTRO}"
echo "Fetching latest version number from Posit..."
# --- Get version from Posit's official RStudio Desktop page ---
VERSION=$(curl -s https://posit.co/download/rstudio-desktop/ \
| grep -Eo 'rstudio-[0-9]+\.[0-9]+\.[0-9]+-[0-9]+' \
| head -n 1 \
| sed -E 's/rstudio-([0-9]+\.[0-9]+\.[0-9]+-[0-9]+)/\1/')
if [ -z "$VERSION" ]; then
echo "Error: Could not extract the latest RStudio version number from Posit's site."
exit 1
fi
echo "Latest RStudio version detected: ${VERSION}"
# --- Construct download URL (Jammy build for Noble until Noble builds exist) ---
BASE_DISTRO="jammy"
BASE_URL="https://download1.rstudio.org/electron/${BASE_DISTRO}/${RSTUDIO_ARCH}"
FULL_URL="${BASE_URL}/rstudio-${VERSION}-${RSTUDIO_ARCH}.deb"
echo "Downloading from:"
echo " ${FULL_URL}"
# --- Validate URL before downloading ---
if ! curl --head --silent --fail "$FULL_URL" >/dev/null; then
echo "Error: The expected RStudio package was not found at ${FULL_URL}"
exit 1
fi
# --- Download and install ---
curl -L "$FULL_URL" -o "$TMPFILE"
echo "Installing RStudio..."
sudo apt install -y "$TMPFILE" | tee -a "$LOGFILE"
# --- Clean up ---
rm -f "$TMPFILE"
echo "RStudio update to version ${VERSION} completed successfully." | tee -a "$LOGFILE"
When all ended, RStudio worked without a hitch, leaving me to move on to other things. The next time that I am prompted to upgrade the environment, this is the way I likely will go.
Command line installation and upgrading of VSCode and VSCodium on Windows, macOS and Linux
Downloading and installing software packages from a website is all very well until you need to update them. Then, a single command streamlines the process significantly. Given that VSCode and VSCodium are updated regularly, this becomes all the more pertinent and explains why I chose them for this piece.
Windows
Now that Windows 10 is more or less behind us, we can focus on Windows 11. That comes with the winget command by default, which is handy because it allows command line installation of anything that is in the Windows store, which includes VSCode and VSCodium. The commands can be as simple as these:
winget install VisualStudioCode
winget install VSCodium.VSCodium
The above is shorthand for this, though:
winget install --id VisualStudioCode
winget install --id VSCodium.VSCodium
If you want exact matches, the above then becomes:
winget install -e --id VisualStudioCode
winget install -e --id VSCodium.VSCodium
For upgrades, this is what is needed:
winget upgrade Microsoft.VisualStudioCode
winget upgrade VSCodium.VSCodium
Even better, you can do an upgrade everything at once operation:
winget upgrade --all
The last part certainly is better than the round trip to a website and back to going through an installation GUI. There is a lot less mouse clicking for one thing.
macOS
On macOS, you need to have Homebrew installed to make things more streamlined. To complete that, you need to run the following command (which may need you to enter your system password to get things to happen):
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Then, you can execute one or both of these in the Terminal app, perhaps having to authorise everything with your password when requested to do so:
brew install --cask visual-studio-code
brew install --cask vscodium
The reason for the -cask switch is that these are apps that you want to go into the correct locations on macOS as well as having their icons appear in Launchpad. Omitting it is fine for command line utilities, but not for these.
To update and upgrade everything that you have installed via Homebrew, just issue the following in a terminal session:
brew update && brew upgrade
Debian, Ubuntu & Linux Mint
Like any other Debian or Ubuntu derivative, Linux Mint has its own in-built package management system via apt. Other Linux distributions have their own way of doing things (Fedora and Arch come to mind here), yet the essential idea is similar in many cases. Because there are a number of steps, I have split out VSCode from VSCodium for added clarity. Because of the way that things are set up, one or both apps can be updated using the usual apt commands without individual attention.
VSCode
The first step is to download the repository key using the following command:
wget -qO- https://packages.microsoft.com/keys/microsoft.asc \
| gpg --dearmor > packages.microsoft.gpg
sudo install -D -o root -g root -m 644 packages.microsoft.gpg /etc/apt/keyrings/packages.microsoft.gpg
Then, you can add the repository like this:
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/packages.microsoft.gpg] \
https://packages.microsoft.com/repos/code stable main" \
| sudo tee /etc/apt/sources.list.d/vscode.list
With that in place, the last thing that you need to do is issue the command for doing the installation from the repository:
sudo apt update; sudo apt install code
Above, I have put two commands together: one to update the repository and another to do the installation.
VSCodium
Since the VSCodium process is similar, here are the three commands together: one for downloading the repository key, another that adds the new repository and one more to perform the repository updates and subsequent installation:
curl -fSsL https://gitlab.com/paulcarroty/vscodium-deb-rpm-repo/raw/master/pub.gpg \
| sudo gpg --dearmor | sudo tee /usr/share/keyrings/vscodium-archive-keyring.gpg >/dev/null
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/vscodium-archive-keyring.gpg] \
https://download.vscodium.com/debs vscodium main" \
| sudo tee /etc/apt/sources.list.d/vscodium.sources
sudo apt update; sudo apt install codium
After the three steps have completed successfully, VSCodium is installed and available to use on your system, and is accessible through the menus too.
Managing Python projects with Poetry
Python Poetry has become a popular choice for managing Python projects because it unifies tasks that once required several tools. Instead of juggling pip for installation, virtualenv for isolation and setuptools for packaging, Poetry brings these strands together and aims to make everyday development feel predictable and tidy. It sits in the same family of all-in-one managers as npm for JavaScript and Cargo for Rust, offering a coherent workflow that spans dependency declaration, environment management and package publishing.
At the heart of Poetry is a simple idea: declare what a project needs in one place and let the tool do the orchestration. Projects describe their dependencies, development tools and metadata in a single configuration file, and Poetry ensures that what is installed on one machine can be replicated on another without nasty surprises. That reliability comes from the presence of a lock file. Once dependencies are resolved, their exact versions are recorded, so future installations repeat the same outcome. The intent here is not only convenience but determinism, helping teams avoid the "works on my machine" refrain that haunts software work.
Core Concepts: Configuration and Lock Files
Two files do the heavy lifting. The pyproject.toml file is where a project announces its name, version and description, as well as the dependencies required to run and to develop it. The poetry.lock file captures the concrete resolution of those requirements at a particular moment. Together, they give you an auditable, repeatable picture of your environment. The structure of TOML keeps the configuration readable, and it spares developers from spreading equivalent settings across setup.cfg, setup.py and requirements.txt. A minimal example shows how this looks in practice.
[tool.poetry]
name = "my_project"
version = "0.1.0"
description = "Example project using Poetry"
authors = ["John <john@example.com>"]
[tool.poetry.dependencies]
python = "^3.10"
requests = "^2.31.0"
[tool.poetry.dev-dependencies]
pytest = "^8.0.0"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
Essential Commands
Working with Poetry day to day quickly becomes a matter of a few memorable commands. Initialising a project configuration starts with poetry init, which steps through the creation of pyproject.toml interactively. Adding a dependency is handled by poetry add followed by the package name. Installing everything described in the configuration is done with poetry install, which writes or updates the lock file. When it is time to refresh dependencies within permitted version ranges, poetry update re-resolves and updates what's installed. Removing a dependency is poetry remove, followed by the package name. For environment management, poetry shell opens a shell inside the virtual environment managed by Poetry, and poetry run allows execution of commands within that same environment without entering a shell. Building distributions is as simple as poetry build, which produces a wheel and a source archive, and publishing to the Python Package Index is managed by poetry publish with credentials or an API token.
Advantages and Considerations
There are clear advantages to taking this route. The dependency experience is simplified because you do not need to keep updating a requirements.txt file by hand. With a lock file in place, environments are reproducible across developer machines and continuous integration runners, which stabilises builds and testing. Packaging is integrated rather than an extra chore, so producing and publishing a release becomes a repeatable process that sits naturally alongside development. Virtual environments are created and activated on demand, keeping projects isolated from one another with little ceremony. The configuration in TOML has the benefit of being structured and human-readable, which reduces the likelihood of configuration drift.
There are also points to consider before adopting Poetry. Projects that are deeply invested in setup.py or complex legacy build pipelines may need a clean migration to pyproject.toml for avoiding clashes. Developers who prefer manual venv and pip workflows can find Poetry opinionated at first because it expects to be responsible for the environment and dependency resolution. It is also designed with modern Python versions in mind, with examples here using Python 3.10.
Migration from pip and requirements.txt
For teams arriving from pip and requirements.txt, moving to Poetry can be done in measured steps. The starting point is installation. Poetry provides an installer script that sets up the tool for your user account.
curl -sSL https://install.python-poetry.org | python3 -
If the installer does not add Poetry to your PATH, adding $HOME/.local/bin to PATH resolves that, after which poetry --version confirms the installation. From the root of your existing project, poetry init creates a new pyproject.toml and invites you to provide metadata and dependencies. If you already maintain requirements.txt files for production and development dependencies, Poetry can ingest those in one sweep. A single file can be imported with poetry add $(cat requirements.txt). Where development dependencies live in a separate file, they can be added into Poetry's dev group with poetry add --group dev $(cat dev-requirements.txt). Once added, Poetry resolves and pins exact versions, leaving a lock file behind to capture the resolution. After verifying that everything installs and tests pass, it becomes safe to retire earlier environment artefacts. Many teams remove requirements.txt entirely if they plan to rely solely on Poetry, deleting any remnants of Pipfile and Pipfile.lock that were left by Pipenv and migrate metadata away from setup.py or setup.cfg in favour of pyproject.toml. With that done, using the environment becomes routine. Opening a shell inside the virtual environment with poetry shell makes commands such as python or pytest use the isolated interpreter. If you prefer to avoid entering a shell, poetry run python script.py or poetry run pytest executes the command in the right context.
Package Publishing
Publishing a package is one of the areas where Poetry streamlines the steps. Accurate metadata in pyproject.toml is important, so name, version, description and other fields should be up-to-date. An example configuration shows commonly used fields.
[tool.poetry]
name = "example-package"
version = "1.0.0"
description = "A simple example package"
authors = ["John <john@example.com>"]
license = "MIT"
readme = "README.md"
homepage = "https://github.com/john/example-package"
repository = "https://github.com/john/example-package"
keywords = ["example", "poetry"]
With metadata set, building the distribution is handled by poetry build, which creates a dist directory containing a .tar.gz source archive and a .whl wheel file. Uploading to the official Python Package Index can be done with username and password, though API tokens are the recommended method because they can be scoped and revoked without affecting account credentials. Configuring a token is done once with poetry config pypi-token.pypi, after which poetry publish will use it to upload. When testing a release before publishing for real, TestPyPI provides a safer target. Poetry supports multiple sources and can be directed to use TestPyPI by declaring it as a repository and then publishing to it.
[[tool.poetry.source]]
name = "testpypi"
url = "https://test.pypi.org/legacy/"
poetry publish -r testpypi
Once uploaded, it is sensible to confirm that the package can be installed in a clean environment using pip install example-package, which verifies that dependencies are correctly declared and wheels are intact.
Continuous Integration with GitHub Actions
Beyond local steps, automation closes the loop. Adding a continuous integration workflow that installs dependencies, runs tests and publishes on a tagged release keeps quality checks and distribution consistent. GitHub Actions provides a hosted environment where Poetry can be installed quickly, dependencies cached and tests executed. A straightforward workflow listens for tags that begin with v, such as v1.0.0, then builds and publishes the package once tests pass. The workflow file sits under .github/workflows and looks like this.
name: Publish to PyPI
on:
push:
tags:
- "v*"
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.10"
- name: Install Poetry
run: |
curl -sSL https://install.python-poetry.org | python3 -
echo "$HOME/.local/bin" >> $GITHUB_PATH
- name: Install dependencies
run: poetry install --no-interaction --no-root
- name: Run tests with pytest
run: poetry run pytest --maxfail=1 --disable-warnings -q
- name: Build package
run: poetry build
- name: Publish to PyPI
if: startsWith(github.ref, 'refs/tags/v')
env:
POETRY_PYPI_TOKEN_PYPI: ${{ secrets.PYPI_TOKEN }}
run: poetry publish --no-interaction --username __token__ --password $POETRY_PYPI_TOKEN_PYPI
This arrangement checks out the repository, installs a consistent Python version, brings in Poetry, installs dependencies based on the lock file, runs tests, builds distributions and only publishes when the workflow is triggered by a version tag. The API token used for publishing should be stored as a repository secret named PYPI_TOKEN so it is not exposed in the codebase or logs. Creating the tag is done locally with git tag v1.0.0 followed by git push origin v1.0.0, which triggers the workflow and results in a published package, moments later. It is often useful to extend this with a test matrix, so the suite runs across supported Python versions, as well as caching to speed up repeated runs by re-using Poetry and pip caches keyed on the lock file.
Project Structure
Package structure is another place where Poetry encourages clarity. A simple, consistent layout makes maintenance and onboarding easier. A typical library keeps its importable code in a package directory named to match the project name in pyproject.toml, with hyphens translated to underscores. Tests live in a separate tests directory, documentation in docs and examples in a directory of the same name. The repository root contains README.md, a licence file, the lock file and a .gitignore that excludes environment directories and build artefacts. The following tree illustrates a balanced structure for a data-oriented utility library.
data-utils/
├── data_utils/
│ ├── __init__.py
│ ├── core.py
│ ├── io.py
│ ├── analysis.py
│ └── cli.py
├── tests/
│ ├── __init__.py
│ ├── test_core.py
│ └── test_analysis.py
├── docs/
│ ├── index.md
│ └── usage.md
├── examples/
│ └── demo.ipynb
├── README.md
├── LICENSE
├── pyproject.toml
├── poetry.lock
└── .gitignore
Within the package directory, init.py can define a public interface and hide internal details. This allows users of the library to import the essentials without needing to know the module layout.
from .core import clean_data
from .analysis import summarise_data
__all__ = ["clean_data", "summarise_data"]
If the project offers a command-line interface, Poetry makes it simple to declare an entry point, so users can run a console command after installation. The scripts section in pyproject.toml maps a command name to a callable, in this case the main function in a cli module.
[tool.poetry.scripts]
data-utils = "data_utils.cli:main"
A basic CLI might be implemented using Click, passing arguments to internal functions and relaying progress.
import click
from data_utils import core
@click.command()
@click.argument("path")
def main(path):
"""Simple CLI example."""
print(f"Processing {path}...")
core.clean_data(path)
print("Done!")
if __name__ == "__main__":
main()
Git ignores should filter out files that do not belong in version control. A sensible default for a Poetry project is as follows.
__pycache__/
*.pyc
*.pyo
*.pyd
.env
.venv
dist/
build/
*.egg-info/
.cache/
.coverage
- Testing and Documentation
Testing sits comfortably alongside this. Many projects adopt pytest because it is straightforward to use and integrates well with Poetry. Running tests through poetry run pytest ensures the virtual environment is used, and a simple unit test demonstrates the pattern.
from data_utils.core import clean_data
def test_clean_data_removes_nulls():
data = [1, None, 2, None, 3]
cleaned = clean_data(data)
assert cleaned == [1, 2, 3]
Documentation can be kept in Markdown or built with tools. MkDocs and Sphinx are common choices for generating websites from your docs, and both can be installed as development dependencies using Poetry. Including notebooks in an examples directory is helpful for illustrating usage in richer contexts, especially for data science libraries. The README should present the essentials succinctly, covering what the project does, how to install it, a short usage example and pointers for development setup. A licence file clarifies terms of use; MIT and Apache 2.0 are widely used options in open source. Advanced CI: Quality Checks and Multi-version Testing
Once structure, tests and documentation are in order, quality checks can be expanded in the continuous integration workflow. Adding automated formatting, import sorting and linting tightens consistency across contributions. An enhanced workflow uses Black, isort and Flake8 before running tests and building, and also includes a matrix to test across multiple Python versions. It runs on pull requests as well as on tagged pushes, which means code quality and compatibility are verified before merging changes and again before publishing a release.
name: Lint, Test and Publish
on:
push:
tags:
- "v*"
pull_request:
jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.9", "3.10", "3.11"]
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install Poetry
run: |
curl -sSL https://install.python-poetry.org | python3 -
echo "$HOME/.local/bin" >> $GITHUB_PATH
- name: Cache Poetry dependencies
uses: actions/cache@v4
with:
path: |
~/.cache/pypoetry
~/.cache/pip
key: poetry-${{ runner.os }}-${{ hashFiles('**/poetry.lock') }}
restore-keys: |
poetry-${{ runner.os }}-
- name: Install dependencies
run: poetry install --no-interaction --no-root
- name: Check code formatting with Black
run: poetry run black --check .
- name: Check import order with isort
run: poetry run isort --check-only .
- name: Run Flake8 linting
run: poetry run flake8 .
- name: Run tests with pytest
run: poetry run pytest --maxfail=1 --disable-warnings -q
- name: Build package
run: poetry build
- name: Publish to PyPI
if: startsWith(github.ref, 'refs/tags/v')
env:
POETRY_PYPI_TOKEN_PYPI: ${{ secrets.PYPI_TOKEN }}
run: poetry publish --no-interaction --username __token__ --password $POETRY_PYPI_TOKEN_PYPI
This workflow builds on the earlier one by checking style and formatting before tests. If any of those checks fail, the process stops and surfaces the problems in the job logs. Caching based on the lock file reduces the time spent installing dependencies by reusing packages where nothing has changed. The matrix section ensures that the library remains compatible with the declared range of Python versions, which is especially helpful just before a release. It is possible to extend this further with coverage reports using pytest-cov and Codecov, static type checking with mypy, or pre-commit hooks to keep local development consistent with continuous integration. Publishing to TestPyPI in a separate job can help validate packaging without affecting the real index, and once outcomes look good, the main publishing step proceeds when a tag is pushed.
Conclusion
The result of adopting Poetry is a project that states its requirements clearly, installs them reliably and produces distributions without ceremony. For new work, it removes much of the friction that once accompanied Python packaging. For existing projects, the migration path is gentle and reversible, and the gains in determinism often show up quickly in fewer environment-related issues. When paired with a small amount of automation in a continuous integration system, the routine of building, testing and publishing becomes repeatable and visible to everyone on the team. That holds whether the package is destined for internal use on a private index or a public release on PyPI.
SAS Packages: Revolutionising code sharing in the SAS ecosystem
In the world of statistical programming, SAS has long been the backbone of data analysis for countless organisations worldwide. Yet, for decades, one of the most significant challenges facing SAS practitioners has been the efficient sharing and reuse of code. Knowledge and expertise have often remained siloed within individual developers or teams, creating inefficiencies and missed opportunities for collaboration. Enter the SAS Packages Framework (SPF), a solution that changes how SAS professionals share, distribute and utilise code across their organisations and the broader community.
The Problem: Fragmented Knowledge and Complex Dependencies
Anyone who has worked extensively with SAS knows the frustration of trying to share complex macros or functions with colleagues. Traditional code sharing in SAS has been plagued by several issues:
- Dependency nightmares: A single macro often relies on dozens of utility macros working behind the scenes, making it nearly impossible to share everything needed for the code to function properly
- Version control chaos: Keeping track of which version of which macro works with which other components becomes an administrative burden
- Platform compatibility issues: Code that works on Windows might fail on Linux systems and vice versa
- Lack of documentation: Without proper documentation and help systems, even the most elegant code becomes unusable to others
- Knowledge concentration: Valuable SAS expertise remains trapped within individuals rather than being shared with the broader community
These challenges have historically meant that SAS developers spend countless hours reinventing the wheel, recreating functionality that already exists elsewhere in their organisation or the wider SAS community.
The Solution: SAS Packages Framework
The SAS Packages Framework, developed by Bartosz Jabłoński, represents a paradigm shift in how SAS code is organised, shared and deployed. At its core, a SAS package is an automatically generated, single, standalone zip file containing organised and ordered code structures, extended with additional metadata and utility files. This solution addresses the fundamental challenges of SAS code sharing by providing:
- Functionality over complexity: Instead of worrying about 73 utility macros working in the background, you simply share one file and tell your colleagues about the main functionality they need to use.
- Complete self-containment: Everything needed for the code to function is bundled into one file, eliminating the "did I remember to include everything?" problem that has plagued SAS developers for years.
- Automatic dependency management: The framework handles the loading order of code components and automatically updates system options like
cmplib=andfmtsearch=for functions and formats. - Cross-platform compatibility: Packages work seamlessly across different operating systems, from Windows to Linux and UNIX environments.
Beyond Macros: A Spectrum of SAS Functionality
One of the most compelling aspects of the SAS Packages Framework is its versatility. While many code-sharing solutions focus solely on macros, SAS packages support a wide range of SAS functionality:
- User-defined functions (both FCMP and CASL)
- IML modules for matrix programming
- PROC PROTO C routines for high-performance computing
- Custom formats and informats
- Libraries and datasets
- PROC DS2 threads and packages
- Data generation code
- Additional content such as documentation PDF files
This comprehensive approach means that virtually any SAS functionality can be packaged and shared, making the framework suitable for everything from simple utility macros to complex analytical frameworks.
Real-World Applications: From Pharmaceutical Research to General Analytics
The adoption of SAS packages has been particularly notable in the pharmaceutical industry, where code quality, validation and sharing are critical concerns. The PharmaForest initiative, led by PHUSE Japan's Open-Source Technology Working Group, exemplifies how the framework is being used to revolutionise pharmaceutical SAS programming. PharmaForest offers a collaborative repository of SAS packages specifically designed for pharmaceutical applications, including:
- OncoPlotter: A comprehensive package for creating figures commonly used in oncology studies
- SAS FAKER: Tools for generating realistic test data while maintaining privacy
- SASLogChecker: Automated log review and validation tools
- rtfCreator: Streamlined RTF output generation
The initiative's philosophy captures perfectly the spirit of the SAS Packages Framework: "Through SAS packages, we want to actively encourage sharing of SAS know-how that has often stayed within individuals. By doing this, we aim to build up collective knowledge, boost productivity, ensure quality through standardisation and energise our community".
The SASPAC Archive: A Growing Ecosystem
The establishment of SASPAC (SAS Packages Archive) represents the maturation of the SAS packages ecosystem. This dedicated repository serves as the official home for SAS packages, with each package maintained as a separate repository complete with version history and documentation. Some notable packages available through SASPAC include:
- BasePlus: Extends BASE SAS with functionality that many developers find themselves wishing was built into SAS itself. With 12 stars on GitHub, it's become one of the most popular packages in the archive.
- MacroArray: Provides macro array functionality that simplifies complex macro programming tasks, addressing a long-standing gap in SAS's macro language capabilities.
- SQLinDS: Enables SQL queries within data steps, bridging the gap between SAS's powerful data step processing and SQL's intuitive query syntax.
- DFA (Dynamic Function Arrays): Offers advanced data structures that extend SAS's analytical capabilities.
- GSM (Generate Secure Macros): Provides tools for protecting proprietary code while still enabling sharing and collaboration.
Getting Started: Surprisingly Simple
Despite the capabilities, getting started with SAS packages is fairly straightforward. The framework can be deployed in multiple ways, depending on your needs. For a quick test or one-time use, you can enable the framework directly from the web:
filename packages "%sysfunc(pathname(work))";
filename SPFinit url "https://raw.githubusercontent.com/yabwon/SAS_PACKAGES/main/SPF/SPFinit.sas";
%include SPFinit;
For permanent installation, you simply create a directory for your packages and install the framework:
filename packages "C:SAS_PACKAGES";
%installPackage(SPFinit)
Once installed, using packages becomes as simple as:
%installPackage(packageName)
%helpPackage(packageName)
%loadPackage(packageName)
Developer Benefits: Quality and Efficiency
For SAS developers, the framework offers numerous advantages that go beyond simple code sharing:
- Enforced organisation: The package development process naturally encourages better code organisation and documentation practices.
- Built-in testing: The framework includes testing capabilities that help ensure code quality and reliability.
- Version management: Packages include metadata such as version numbers and generation timestamps, supporting modern DevOps practices.
- Integrity verification: The framework provides tools to verify package authenticity and integrity, addressing security concerns in enterprise environments.
- Cherry-picking: Users can load only specific components from a package, reducing memory usage and namespace pollution.
The Future of SAS Code Sharing
The growing adoption of SAS packages represents more than just a new tool, it signals a fundamental shift towards a more collaborative and efficient SAS ecosystem. The framework's MIT licensing and 100% open-source nature ensure that it remains accessible to all SAS users, from individual practitioners to large enterprise installations. This democratisation of advanced code-sharing capabilities levels the playing field and enables even small teams to benefit from enterprise-grade development practices.
As the ecosystem continues to grow, with contributions from pharmaceutical companies, academic institutions and individual developers worldwide, the SAS Packages Framework is proving that the future of SAS programming lies not in isolated development, but in collaborative, community-driven innovation.
For SAS practitioners looking to modernise their development practices, improve code quality and tap into the collective knowledge of the global SAS community, exploring SAS packages isn't just an option, it's becoming an essential step towards more efficient and effective statistical programming.
What to do when the externally-managed environment error appears while using pip to install Python packages on Linux Mint 22
After upgrading to Linux Mint 22, the following message appeared when attempting to install Python packages using the pip command:
error: externally-managed-environment
× This environment is externally managed
╰─> To install Python packages system-wide, try apt install
python3-xyz, where xyz is the package you are trying to
install.
If you wish to install a non-Debian-packaged Python package,
create a virtual environment using python3 -m venv path/to/venv.
Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
sure you have python3-full installed.
If you wish to install a non-Debian packaged Python application,
it may be easiest to use pipx install xyz, which will manage a
virtual environment for you. Make sure you have pipx installed.
See /usr/share/doc/python3.12/README.venv for more information.
note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.
This will frustrate anyone following how-tos on the web, so users will need to know about it. On something like Linux Mint, the repositories may not be as up-to-date as PyPI, so picking up the very latest version has its advantages. Thus, I initially used the unrecommended --break-system-packages switch to get things going as before, since doing never broke anything before. While the way of working feels like an overkill in some ways, using pipx probably is the way forward as long as things work as I want them to do.
There is wisdom in using virtual environments too, especially when AI models are involved. For most of what I get to do, that may be getting too elaborate. Then, deleting or renaming the message file in /usr/lib/python3.12/EXTERNALLY-MANAGED is tempting if that gets around things, as retrograde as that probably is. After all, I never broke anything before this message started to appear, possibly since my interests are data related.
Upgrading a web server from Debian 11 to Debian 12
While Debian 12 may be with us since the middle of 2023 and Debian 13 is due in the middle of next year, it has taken me until now to upgrade one of my web servers. The tardiness may have something to do with a mishap on another system that resulted in a rebuild, something to avoid it at all possible.
Nevertheless, I went and had a go with the aforementioned web server after doing some advance research. Thus, I can relate the process that you find here in the knowledge that it worked for me. Also, I will have it on file for everyone's future reference. The first step is to ensure that the system is up-to-date by executing the following commands:
sudo apt update
sudo apt upgrade
sudo apt dist-upgrade
Next, it is best to remove extraneous packages using these commands:
sudo apt --purge autoremove
sudo apt autoclean
Once you have backed up important data and configuration files, you can move to the first step of the upgrade process. This involves changing the repository locations from what is there for bullseye (Debian 11) to those for bookworm (Debian 12). Issuing the following commands will accomplish this:
sudo sed -i 's/bullseye/bookworm/g' /etc/apt/sources.list
sudo sed -i 's/bullseye/bookworm/g' /etc/apt/sources.list.d/*
In my case, I found the second of these to be extraneous since everything was included in the single file. Also, Debian 12 has added a new non-free repository called non-free-firmware. This can be added at this stage by manual editing of the above. In my case, I did it later because the warning message only began to appear at that stage.
Once the repository locations, it is time to update the package information using the following command:
sudo apt update
Then, it is time to first perform a minimal upgrade using the following command, that takes a conservative approach by updating existing packages without installing any new ones:
sudo apt upgrade --without-new-pkgs
Once that has completed, one needs to issue the following command to install new packages if needed for dependencies and even remove incompatible or unnecessary ones, as well as performing kernel upgrades:
sudo apt full-upgrade
Given all the changes, the completion of the foregoing commands' execution necessitates a system restart, which can be the most nerve-wracking part of the process when you are dealing with a remote server accessed using SSH. While, there are a few options for accomplishing this, here is one that is compatible with the upgrade cycle:
sudo systemctl reboot
Once you can log back into the system again, there is one more piece of housekeeping needed. This step not only removes redundant packages that were automatically installed, but also does the same for their configuration files, an act that really cleans up things. The command to execute is as follows:
sudo apt --purge autoremove
For added reassurance that the upgrade has completed, issuing the following command will show details like the operating system's distributor ID, description, release version and codename:
lsb_release -a
If you run the above commands as root, the sudo prefix is not needed, yet it is perhaps safer to execute them under a less privileged account anyway. The process needs the paying of attention to any prompts and questions about configuration files and service restarts if they arise. Nothing like that came up in my case, possibly because this web server serves flat files created using Hugo, avoiding the use of scripting and databases, which would add to the system complexity. Such a simple situation makes the use of scripting more of a possibility. The exercise was speedy enough for me too, though patience is of the essence should a 30–60 minute completion time be your lot, depending on your system and internet speed.
What to do when a GPG signature becomes invalid for a package repository on Linux Mint
During a package update on my main Linux system, I encountered the following kind of error message:
An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: https://cli.github.com/packages stable InRelease: The following signatures were invalid: EXPKEYSIG <GPG Key> GitHub CLI
The message indicated a problem with the GPG signature verification for the GitHub CLI repository. The cause was that the signature for the repository was invalid, preventing the package manager from updating the repository's index files. The first step then was to remove the invalid GPG key using the following command:
sudo apt-key del <GPG Key>
With the invalid GPG key removed, the next step is to add the new GPG key for the GitHub CLI repository by issuing the following command:
curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg | sudo tee /usr/share/keyrings/githubcli-archive-keyring.gpg > /dev/null
Once I had the new GPG key, I was able to use my usual system update process without any problem. The error message was gone, and updates and upgrades proceeded as intended.