Technology Tales

Adventures in consumer and enterprise technology

Remote access between Mac and Linux: Choosing the right approach

28th October 2025

Connecting from a Mac to a Linux desktop on the same network can be done in several ways, and the right choice depends on whether terminal access suffices or a full graphical session is needed. Terminal access is the simplest to arrange and often the most robust, while graphical access can be provided either by creating a fresh desktop session or by sharing the one already open on the Linux machine. Each approach trades ease of setup, performance and fidelity in different ways, so it helps to understand the options before settling on a configuration.

Understanding Your Requirements

The choice between methods rests primarily on three questions. First, is command-line access sufficient, or is a graphical desktop required? Second, if a desktop is needed, should it be a new session, or must it mirror the existing physical display? Third, how important is responsiveness compared to visual fidelity and feature completeness?

For administrative tasks that involve editing configuration files, managing services, or running scripts, SSH provides everything necessary. When a desktop environment is required, the decision becomes whether to view the exact state of the Linux machine's monitor or to work in a separate session.

The Three Main Options

SSH for Terminal Access

SSH requires no graphical overhead and works reliably over any connection. For many administrative tasks, this is all that is needed. Setting up SSH access is straightforward and forms the foundation for other secure operations, including file transfer and tunnelling.

RDP for New Desktop Sessions

Remote Desktop Protocol excels at creating new sessions with clean input handling and good performance over imperfect connections. RDP with a lightweight desktop such as Xfce delivers the most responsive experience for new sessions, though it does not support compositing desktops like Cinnamon well. The protocol translates keyboard and mouse input in a way that many clients have optimised for years, making it the most forgiving route when precise input behaviour matters.

VNC for Virtual or Shared Desktops

VNC can either create new virtual desktops or share the physical display. TigerVNC is suitable when a new Cinnamon session is acceptable and continuity with the local environment is valued. It can launch a full Cinnamon desktop in a virtual session, though it may feel less responsive than RDP, particularly when network conditions are suboptimal.

x11vnc mirrors the physical display exactly, making it ideal for monitoring ongoing work or providing remote guidance. This is the only option when the requirement is to see precisely what appears on the Linux machine's screen. However, it shares the same performance characteristics as other VNC solutions and is limited to showing what is already displayed locally.

Making the Choice

The decision ultimately comes down to the specific use case. If the goal is to work efficiently in a fresh desktop session with optimal responsiveness, RDP to an Xfce desktop environment is the clear choice. If maintaining the full Cinnamon experience in a new session is important, TigerVNC provides that continuity. When the task requires seeing or controlling the exact desktop session that is already running on the Linux machine, x11vnc is the only viable option.

In the articles that follow, we will examine the practical setup and configuration of x11vnc for sharing physical desktops, followed by detailed guidance on SSH, RDP and TigerVNC for those preferring terminal access or fresh desktop sessions.

What's Next

Part 2 explores x11vnc in detail, covering everything from basic setup to advanced performance tuning, input handling with KVM switches, clipboard troubleshooting and running x11vnc as a system service.

Part 3 examines SSH for terminal access, RDP with Xfce for responsive remote sessions, and TigerVNC for virtual Cinnamon desktops, along with file transfer options and operational considerations.

Some PowerShell fundamentals for practical automation

27th October 2025

In the last few months, I have taken to using PowerShell for automating tasks while working on a new contract. There has been an element of vibe programming some of the scripts, which is why I wished to collate a reference guide that anyone can have to hand. While working with PowerShell every day does help to reinforce the learning, it also helps to look up granular concepts on a more bite-sized level. This especially matters given PowerShell's object-oriented approach. After all, many of us build things up iteratively from little steps, which also allows for more flexibility. Using an AI is all very well, yet the fastest recall is always from your on head.

1. Variables and Basic Data Types

Variables start with a dollar sign and hold values you intend to reuse, so names like $date, $outDir and $finalDir become anchors for later operations. Dates are a frequent companion in filenames and logs, and PowerShell's Get-Date makes this straightforward. A format string such as Get-Date -Format "yyyy-MM-dd" yields values like 2025-10-27, while Get-Date -Format "yyyy-MM-dd HH:mm:ss" adds a precise timestamp that helps when tracing the order of events. Because these commands return text when a format is specified, you can stitch the results into other strings without fuss.

2. File System Operations

As soon as you start handling files, you will meet a cluster of commands that make navigation robust rather than fragile. Join-Path assembles folder segments without worrying about stray slashes, Test-Path checks for the existence of a target, and New-Item creates folders when needed. Moving items with Move-Item keeps the momentum going once the structure exists.

Environment variables give cross-machine resilience; reading $env:TEMP finds the system's temporary area, and [Environment]::GetFolderPath("MyDocuments") retrieves a well-known Windows location without hard-coding. Setting context helps too, so Set-Location acts much like cd to make a directory the default focus for subsequent file operations. You can combine these approaches, as in cd ([Environment]::GetFolderPath("MyDocuments")), which navigates directly to the My Documents folder without hard-coded paths.

Scripts are often paired with nearby files, and Split-Path $ScriptPath -Parent extracts a parent folder from a full path so you can create companions in the same place. Network locations behave like local ones, with Universal Naming Convention paths beginning \ supported throughout, and Windows paths do not require careful case matching because the file systems are generally case-insensitive, which differs from many Unix-based systems. Even simple details matter, so constructing strings such as "$Folder*" is enough for wildcard searches, with backslashes treated correctly and whitespace handled sensibly.

3. Arrays and Collections

Arrays are created with @() and make it easy to keep related items together, whether those are folders in a $locs array or filenames gathered into $progs1, $progs2 and others. Indexing retrieves specific positions with square brackets, so $locs[0] returns the first entry, and a variable index like $outFiles[$i] supports loop counters.

A single value can still sit in an array using syntax such as @("bm_rc_report.sas"), which keeps your code consistent when functions always expect sequences. Any collection advertises how many items it holds using the Count property, so checking $files.Count equals zero tells you whether there is anything to process.

4. Hash Tables

When you need fast lookups, a hash table works as a dictionary that associates keys and values. Creating one with @{$locs[0] = $progs1} ties the first location to its corresponding programme list and then $locsProgs[$loc] retrieves the associated filenames for whichever folder you are considering. This is a neat stepping stone to loops and conditionals because it organises data around meaningful relationships rather than leaving you to juggle parallel arrays.

5. Control Flow

Control flow is where scripts begin to feel purposeful. A foreach loop steps through the items in a collection and is comfortable with nested passes, so you might iterate through folders, then the files inside each folder, and then a set of search patterns for those files. A for loop offers a counting pattern with initialisation, a condition and an increment written as for ($i = 0; $i -lt 5; $i++). It differs from foreach by focusing on the numeric progression rather than the items themselves.

Counters are introduced with $i = 0 and advanced with $i++, which in turn blends well with array indexing. Conditions gate work to what needs doing. Patterns such as if (-not (Test-Path ...)) reduce needless operations by creating folders only when they do not exist, and an else branch can note alternative outcomes, such as a message that a search pattern was not found.

Sometimes there is nothing to gain from proceeding, and break exits the current loop immediately, which is an efficient way to stop retrying once a log write succeeds. At other times it is better to skip just the current iteration, and continue moves directly to the next pass, which proves useful when a file list turns out to be empty for a given pattern.

6. String Operations

Strings support much of this work, so several operations are worth learning well. String interpolation allows variables to be embedded inside text using "$variable" or by wrapping expressions as "$($expression)", which becomes handy when constructing paths like "psoutput$($date)".

Splitting text is as simple as -split, and a statement such as $stub, $type = $File -split "." divides a filename around its dot, assigning the parts to two variables in one step. This demonstrates multiple variable assignment, where the first part goes to $stub and the second to $type, allowing you to decompose strings efficiently.

When transforming text, the -Replace operator substitutes all occurrences of one pattern with another, and you can chain replacements, as in -replace $Match, $Replace -replace $Match2, $Replace2, so each change applies to the modified output of the previous one.

Building new names clearly is easier with braces, as in "${stub}_${date}.txt", which prevents ambiguity when variable names abut other characters. Escaping characters is sometimes needed, so using "." treats a dot as a literal in a split operation. The backtick character ` serves as PowerShell's escape character and introduces special sequences like a newline written as `n, a tab as `t and a carriage return as `r. When you need to preserve formatting across lines without worrying about escapes, here-strings created with @" ... "@ keep indentation and line breaks intact.

7. Pipeline Operations

PowerShell's pipeline threads operations together so that the output of one command flows to the next. The pipe character | links each stage, and commands such as ForEach-Object (which processes each item), Where-Object (which filters items based on conditions) and Sort-Object -Unique (which removes duplicates) become building blocks that shape data progressively.

Within these blocks, the current item appears as $_, and properties exposed by commands can be read with syntax like $_.InputObject or $_.SideIndicator, the latter being especially relevant when handling comparison results. With pipeline formatting, you can emit compact summaries, as in ForEach-Object { "$($_.SideIndicator) $($_.InputObject)" }, which brings together multiple properties into a single line of output.

A multi-stage pipeline filtering approach often follows three stages: Select-String finds matches, ForEach-Object extracts only the values you need, and Where-Object discards anything that fails your criteria. This progressive refinement lets you start broad and narrow results step by step. There is no compulsion to over-engineer, though; a simplified pipeline might omit filtering stages if the initial search is already precise enough to return only what you need.

8. Comparison and Matching

Behind many of these steps sit comparison and matching operators that extend beyond simple equality. Pattern matching appears through -notmatch, which uses regular expressions to decide whether a value does not fit a given pattern, and it sits alongside -eq, -ne and -lt for equality, inequality and numeric comparison.

Complex conditions chain with -and, so an expression such as $_ -notmatch '^%macro$' -and $_ -notmatch '^%mend$' ensures both constraints are satisfied before an item passes through. Negative matching in particular helps exclude unwanted lines while leaving the rest untouched.

9. Regular Expressions

Regular expressions define patterns that match or search for text, often surfacing through operators such as -match and -replace. Simple patterns like .log$ identify strings ending with .log, while more elaborate ones capture groups using parentheses, as in (sdtm.|adam.), which finds two alternative prefixes.

Anchors matter, so ^ pins a match to the start of a line and $ pins it to the end, which is why ^%macro$ means an entire line consists of nothing but %macro. Character classes provide shortcuts such as w for word characters (letters, digits or underscores) and s for whitespace. The pattern "GRCw*" matches "GRC" followed by zero or more word characters, demonstrating how * controls repetition. Other quantifiers like + (one or more) and ? (zero or one) offer further control.

Escaping special characters with a backslash turns them into literals, so . matches a dot rather than any character. More complex patterns like '%(m|v).*?(?=[,(;s])' combine alternation with non-greedy matching and lookaheads to define precise search criteria.

When working with matches in pipelines, $_.Matches.Value extracts the actual text that matched the pattern, rather than returning the entire line where the match was found. This proves essential when you need just the matching portion for further processing. The syntax can appear dense at first, but PowerShell's integration means you can test patterns quickly within a pipeline or conditional, refining as you go.

10. File Content Operations

Searching file content with Select-String applies regular expressions to lines and returns match objects, while Out-File writes text to files with options such as -Append and -Encoding UTF8 to control how content is persisted.

11. File and Directory Searching

Commands for locating files typically combine path operations with filters. Get-ChildItem retrieves items from a folder, and parameters like -Filter or -Include narrow results by pattern. Wildcards such as * are often enough, but regular expressions provide finer control when integrated with pipeline operations. Recursion through subdirectories is available with -Recurse, and combining these techniques allows you to find specific files scattered across a directory tree. Once items are located, properties like FullName, Name and LastWriteTime let you decide what to do next.

12. Object Properties

Objects exposed by commands carry properties that you can access directly. $File.FullName retrieves an absolute path from a file object, while names, sizes and modification timestamps are all available as well. Subexpressions introduced with $() evaluate an inner expression within a larger string or command, which is why $($File.FullName) is often seen when embedding property values in strings. However, subexpressions are not always required; direct property access works cleanly in many contexts. For instance, $File.FullName -Replace ... reads naturally and works as you would expect because the property access is unambiguous when used as a command argument rather than embedded within a string.

13. Output and Logging

Producing output that can be read later is easier if you apply a few conventions. Write-Output sends structured lines to the console or pipeline, while Write-Warning signals notable conditions without halting execution, a helpful way to flag missing files. There are times when command output is unnecessary, and piping to Out-Null discards it quietly, for example when creating directories. Larger scripts benefit from consistency and a short custom function such as Write-Log establishes a uniform format for messages, optionally pairing console output with a line written to a file.

14. Functions

Functions tie these pieces together as reusable blocks with a clear interface. Defining one with function Get-UniquePatternMatches { } sets the structure, and a param() block declares the inputs. Strongly typed parameters like [string[]] make it clear that a function accepts an array of strings, and naming parameters $Folder and $Pattern describes their roles without additional comments.

Functions are called using named parameters in the format Get-UniquePatternMatches -Folder $loc -Pattern '(sdtm.|adam.)', which makes the intent explicit. It is common to pass several arrays into similar functions, so a function might have many parameters of the same type. Using clear, descriptive names such as $Match, $Replace, $Match2 and $Replace2 leaves little doubt about intent, even if an array of replacement rules would sometimes reduce repetition.

Positional parameters are also available; when calling Do-Compare you can omit parameter names and rely on the order defined in param(). PowerShell follows verb-noun naming conventions for functions, with common verbs including Get, Set, New, Remove, Copy, Move and Test. Following this pattern, as in Multiply-Files, places your code in the mainstream of PowerShell conventions.

It is worth avoiding a common pitfall where a function declares param([string[]]$Files) but inadvertently reads a variable like $progs from outside the function. PowerShell allows this via scope inheritance, where functions can access variables from parent scopes, but it makes maintenance harder and disguises the function's true dependencies. Being explicit about parameters creates more maintainable code.

Simple functions can still do useful work without complexity. A minimal function implementation with basic looping and conditional logic can accomplish useful tasks, and a recurring structure can be reused with minor revisions, swapping one regular expression for another while leaving the looping and logging intact. Replacement chains are flexible; add as many -replace steps as are needed, and no more.

Parameters can be reused meaningfully too, demonstrating parameter reuse where a $Match variable serves double duty: first as a filename filter in -Include, then as a text pattern for -replace. Nested function calls tie output and input together, as when piping a here-string to Out-File (Join-Path ...) to construct a file path at the moment of writing.

15. Comments

Comments play a quiet but essential role. A line starting with # explains why something is the way it is or temporarily disables a line without deleting it, which is invaluable when testing and refining.

16. File Comparison

Comparison across datasets rounds out common tasks, and Compare-Object identifies differences between two sets, telling you which items are unique to each side or shared by both. Side indicators in the output are compact: <= shows the first set, => the second, and == indicates an item present in both.

17. Common Parameters

Across many commands, common parameters behave consistently. -Force allows operations that would otherwise be blocked and overwrites existing items without prompting in contexts that support it, -LiteralPath treats a path exactly as written without interpreting wildcards, and -Append adds content to existing files rather than overwriting them. These options smooth edges when you know what you want a command to do and prefer to avoid interactive questions or unintended pattern expansion.

18. Advanced Scripting Features

A number of advanced features make scripts sturdier. Automatic variables such as $MyInvocation.MyCommand.Path provide information about the running script, including its full path, which is practical for locating resources relative to the script file. Set-StrictMode -Version Latest enforces stricter rules that turn common mistakes into immediate errors, such as using an uninitialised variable or referencing a property that does not exist. Clearing the console at the outset with Clear-Host gives a clean slate for output when a script begins.

19. .NET Framework Integration

Integration with the .NET Framework extends PowerShell's reach, and here are some examples. For instance, calling [System.IO.Path]::GetFileNameWithoutExtension() extracts a base filename using a tested library method. To gain more control over file I/O, [System.IO.File]::Open() and System.IO.StreamWriter expose low-level handles that specify sharing and access modes, which can help when you need to coordinate writing without blocking other readers. File sharing options like [System.IO.FileShare]::Read allow other processes to read a log file while the script writes to it, reducing contention and surprises.

20. Error Handling

Error handling deserves a clear pattern. Wrapping risky operations in try { } catch { } blocks captures exceptions, so a script can respond gracefully, perhaps by writing a warning and moving on. A finally block can be added for clean-up operations that must run regardless of success or failure.

When transient conditions are expected, a retry logic pattern is often enough, pairing a counter with Start-Sleep to attempt an operation several times before giving up. Waiting for a brief period such as Start-Sleep -Milliseconds 200 gives other processes time to release locks or for temporary conditions to clear.

Alongside this, checking for null values keeps assumptions in check, so conditions like if ($null -ne $process) ensure that you only read properties when an object was created successfully. This defensive approach prevents cascading errors when operations fail to return expected objects.

21. External Process Management

Managing external programmes is a common requirement, and PowerShell's Start-Process offers a controlled route. Several parameters control its behaviour precisely:

The -Wait parameter makes PowerShell pause until the external process completes, essential for sequential processing where later steps depend on earlier ones. The -PassThru parameter returns a process object, allowing you to inspect properties like exit codes after execution completes. The -NoNewWindow parameter runs the external process in the current console rather than opening a new window, keeping output consolidated. If a command expects the Command Prompt environment, calling it via cmd.exe /c $cmd integrates cleanly, ensuring compatibility with programmes designed for the CMD shell.

Exit codes reported with $process.ExitCode indicate success with zero and errors with non-zero values in most tools, so checking these numbers preserves confidence in the sequence of steps. The script demonstrates synchronous execution, processing files one at a time rather than in parallel, which can be an advantage when dependencies exist between stages or when you need to ensure ordered completion.

22. Script Termination

Scripts need to finish in a way that other tools understand. Exiting with Exit 0 signals success to schedulers and orchestrators that depend on numeric codes, while non-zero values indicate error conditions that trigger alerts or retries.

Bringing It All Together

Because this is a granular selection, it leaves it to us to piece everything together to accomplish the tasks that we have to complete. In that way, we can embed the knowledge so that we are vibe coding all the time, ensuring that a more deterministic path is followed.

Generating Git commit messages automatically using aicommit and OpenAI

25th October 2025

One of the overheads of using version control systems like Subversion or Git is the need to create descriptive messages for each revision. Now that GitHub has its copilot, it now generates those messages for you. However, that still leaves anyone with a local git repository out in the cold, even if you are uploading to GitHub as your remote repo.

One thing that a Homebrew update does is to highlight other packages that are available, which is how I got to learn of a tool that helps with this, aicommit. Installing is just a simple command away:

brew install aicommit

Once that is complete, you now have a tool that generates messages describing very commit using GPT. For it to work, you do need to get yourself set up with OpenAI's API services and generate a token that you can use. That needs an environment variable to be set to make it available. On Linux (and Mac), this works:

export OPENAI_API_KEY=<Enter the API token here, without the brackets>

Because I use this API for Python scripting, that part was already in place. Thus, I could proceed to the next stage: inserting it into my workflow. For the sake of added automation, this uses shell scripting on my machines. The basis sequence is this:

git add .
git commit -m "<a default message>"
git push

The first line above stages everything while the second commits the files with an associated message (git makes this mandatory, much like Subversion) and the third pushes the files into the GitHub repository. Fitting in aicommit then changes the above to this:

git add .
aicommit
git push

There is now no need to define a message because aicommit does that for you, saving some effort. However, token limitations on the OpenAI side mean that the aicommit command can fail, causing the update operation to abort. Thus, it is safer to catch that situation using the following code:

git add .
if ! aicommit 2>/dev/null; then
    echo "  aicommit failed, using fallback"
    git commit -m "<a default message>"
fi
git push

This now informs me what has happened when the AI option is overloaded and the scripts fallback to a default option that is always available with git. While there is more to my git scripting than this, the snippets included here should get across how things can work. They go well for small push operations, which is what happens most of the time; usually, I do not attempt more than that.

Command line installation and upgrading of VSCode and VSCodium on Windows, macOS and Linux

25th October 2025

Downloading and installing software packages from a website is all very well until you need to update them. Then, a single command streamlines the process significantly. Given that VSCode and VSCodium are updated regularly, this becomes all the more pertinent and explains why I chose them for this piece.

Windows

Now that Windows 10 is more or less behind us, we can focus on Windows 11. That comes with the winget command by default, which is handy because it allows command line installation of anything that is in the Windows store, which includes VSCode and VSCodium. The commands can be as simple as these:

winget install VisualStudioCode
winget install VSCodium.VSCodium

The above is shorthand for this, though:

winget install --id VisualStudioCode
winget install --id VSCodium.VSCodium

If you want exact matches, the above then becomes:

winget install -e --id VisualStudioCode
winget install -e --id VSCodium.VSCodium

For upgrades, this is what is needed:

winget upgrade Microsoft.VisualStudioCode
winget upgrade VSCodium.VSCodium

Even better, you can do an upgrade everything at once operation:

winget upgrade --all

The last part certainly is better than the round trip to a website and back to going through an installation GUI. There is a lot less mouse clicking for one thing.

macOS

On macOS, you need to have Homebrew installed to make things more streamlined. To complete that, you need to run the following command (which may need you to enter your system password to get things to happen):

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Then, you can execute one or both of these in the Terminal app, perhaps having to authorise everything with your password when requested to do so:

brew install --cask visual-studio-code
brew install --cask vscodium

The reason for the -cask switch is that these are apps that you want to go into the correct locations on macOS as well as having their icons appear in Launchpad. Omitting it is fine for command line utilities, but not for these.

To update and upgrade everything that you have installed via Homebrew, just issue the following in a terminal session:

brew update && brew upgrade

Debian, Ubuntu & Linux Mint

Like any other Debian or Ubuntu derivative, Linux Mint has its own in-built package management system via apt. Other Linux distributions have their own way of doing things (Fedora and Arch come to mind here), yet the essential idea is similar in many cases. Because there are a number of steps, I have split out VSCode from VSCodium for added clarity. Because of the way that things are set up, one or both apps can be updated using the usual apt commands without individual attention.

VSCode

The first step is to download the repository key using the following command:

wget -qO- https://packages.microsoft.com/keys/microsoft.asc \
| gpg --dearmor > packages.microsoft.gpg
sudo install -D -o root -g root -m 644 packages.microsoft.gpg /etc/apt/keyrings/packages.microsoft.gpg

Then, you can add the repository like this:

echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/packages.microsoft.gpg] \
https://packages.microsoft.com/repos/code stable main" \
| sudo tee /etc/apt/sources.list.d/vscode.list

With that in place, the last thing that you need to do is issue the command for doing the installation from the repository:

sudo apt update; sudo apt install code

Above, I have put two commands together: one to update the repository and another to do the installation.

VSCodium

Since the VSCodium process is similar, here are the three commands together: one for downloading the repository key, another that adds the new repository and one more to perform the repository updates and subsequent installation:

curl -fSsL https://gitlab.com/paulcarroty/vscodium-deb-rpm-repo/raw/master/pub.gpg \
| sudo gpg --dearmor | sudo tee /usr/share/keyrings/vscodium-archive-keyring.gpg >/dev/null

echo "deb [arch=amd64 signed-by=/usr/share/keyrings/vscodium-archive-keyring.gpg] \
https://download.vscodium.com/debs vscodium main" \
| sudo tee /etc/apt/sources.list.d/vscodium.sources

sudo apt update; sudo apt install codium

After the three steps have completed successfully, VSCodium is installed and available to use on your system, and is accessible through the menus too.

A more elegant way to read and combine data from multiple CSV files in Julia

24th October 2025

When I was compiling financial information for my accountant recently, I needed to read in a number of CSV files and combine their contents to enable further processing. This was all in a Julia script, and there was a time when I would use an explicit loop to do this combination. However, I came across a better way to accomplish this that I am sharing here with you now. First, you need to define a list of files like this:

files = ["5-2024.csv", "6-2024.csv", "7-2024.csv", "9-2024.csv", "10-2024.csv", "11-2024.csv", "12-2024.csv", "1-2025.csv", "2-2025.csv", "3-2025.csv", "4-2025.csv"]

Where there are alternatives to the above, including globbing (using wildcards with a Julia package that works with these), I decided to keep things simple for myself. Now we come to the line that does all the heavy lifting:

df = vcat([CSV.read(dir * file, DataFrame, normalizenames = true, header = 5, skipto = 6; silencewarnings=true) for file in files]...)

Near the end, there is the list comprehension ([***** for file in files]) that avoids the need for an explicit loop that I have used a few times in the past. This loops through each file in the list defined at the top, reading it into a dataframe as per the DataFrame option. The normalizenames option replaces spaces with underscores and cleans up any invalid characters. The header and skipto options tell Julia where to find the column headings and where to start reading the file, respectively. Then, the silencewarnings option suppresses any warnings about missing columns or inconsistent rows; clearly a check on the data frame is needed to ensure that all is in order if you wish to go the same route as I did.

The splat (...) operator takes the resulting list of data frames and converts them into individual arguments passed to the vcat function that virtually concatenates them together to create the df data frame. Just like suppressing warnings about missing columns or inconsistent rows during CSV file read time, this involves trust in the input data that everything is structured alike. Naturally, you need to do your own checks to ensure that is the case, as it was for me with what I had to do.

Loading API Keys from Linux shell environment variables in Python with Dotenv

23rd October 2025

Recently, I ran into trouble with getting Python to pick up an API key that I had defined in the underlying bash environment. This was within a Python console running inside the Positron IDE for R and Python scripting. Opening up the folder containing my Python scripts within the IDE was part of the solution. The next part was creating a .env file within the same folder. A line like this was added within the new file:

export API_KEY="<API key value>"

That meant that code like the following then read in the API key in a more robust manner:

import os
from dotenv import load_dotenv
load_dotenv()
api_key = os.getenv('API_KEY', 'default_value')

This imports the os module and the load_dotenv method from the dotenv package. Then, load_dotenv is executed to load the .env file and its contents. After that, the os.getenv function can assign the API key to a Python variable from the value of the environment variable.

Since this also was within a Git repository, a .gitignore file needed creating with the contents .env to avoid that file being uploaded to GitHub, which is the last place where you should be storing credentials like passwords, passphrases and API keys. While my repository may be private, the state of things at these troubled times mean that even that is no failsafe.

Taking control of Ruff checks on Python scripts

22nd October 2025

Positron is becoming my tool of choice for developing Python code. Along from using a Python console like a REPL environment, it also includes Ruff for checking code compliance. One of its rules is that Python modules must be declared at the top. However, I want to use some code that checks for the present of any modules used in a script, installing those that are missing. This means that import statements appear later in a script that Ruff recommends, making me wish for a way to turn off that check since things run well anyway. The chosen solution is to create a file called pyproject.toml in the directory where my scripts are store and add the following lines in there to accomplish what I want:

[tool.ruff]
ignore = ["E402"]

Here, it helps if you open a folder in Positron, achieving the same outcome as you would in the VSCode on which the IDE is based. While I have only listed one check here, you also can have a comma-delimited list of quoted strings if you need to switch off more than one rule at once.

Controlling the version of Python used in the Positron console with virtual environments

21st October 2025

Because I have Homebrew installed on my Linux system for getting Hugo and LanguageTool on there, I also have a later version of Python than is available from the Linux Mint repositories. Both 3.12 and 3.13 are on my machine as a consequence. Here is the line in my .bashrc file that makes that happen:

eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)"

The result is when I issue the command which python3, this is what I get:

/home/linuxbrew/.linuxbrew/bin/python3

However, Positron looks to /usr/bin/python3 by default. Since this can get confusing, setting a virtual environment has its uses as long as you create it with the intended Python version. This is how you can do it, even if I needed to use sudo mode for some reason:

python3 -m venv .venv

When working solely on the command line, activating it becomes a necessity, adding another manual step to a mind that had resisted all this until recently:

source .venv/bin/activate

Thankfully, just issuing the deactivate command will do the eponymous action. Even better, just opening a folder with a venv in Positron saves you from issuing the extra commands and grants you the desired Python version in the console that it opens. Having run into some clashes between package versions, I am beginning to appreciate having a dedicated environment for a set of Python scripts, especially when an IDE makes it easy to work with such an arrangement.

Avoiding Python missing package errors with automatic installation checks

20th October 2025

Though some may not like having something preceding package import statements in Python scripts, I prefer the added robustness of an extra piece of code checking for package presence and installing anything that is missing in place getting an error. In what follows, I define the list of packages that need to be present for everything to work:

required_packages = ["pandas", "tqdm", "progressbar2", "sqlalchemy", "pymysql"]

Then, I declare the inbuilt modules in advance of looping through the list that was already defined (adding special handling for a case where there has been a name change):

import subprocess
import sys
for package in required_packages:
    try:
        __import__(package if package != "progressbar2" else "progressbar")
        print(f"{package} is already installed.")
    except ImportError:
        print(f"{package} not found. Installing...")
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])

The above code tries importing the package and catches the error to do the required installation. While a stable environment may be a better way around all of this, I find that this way of working adds valuable robustness to a script and automates what you would need to do anyway. Though the use of requirements files and even the Poetry tool for dependency management may be next steps, this approach suffices for my simpler needs, at least when it comes to personal projects.

Some Data Science newsletters that may be worth your time

19th October 2025

Staying informed about developments in data science and artificial intelligence without drowning in an endless stream of blog posts, research papers and tool announcements presents a genuine challenge for practitioners. The newsletters profiled below offer a solution to this problem by delivering curated digests at weekly or near-weekly intervals, filtering what matters from the constant flow of new content across the field. Each publication serves a distinct purpose, from broad data science coverage and community event notifications to AI business strategy and statistical foundations, allowing readers to select resources that match their specific interests, whether technical depth, practical application, career development or strategic awareness. What follows examines what each newsletter offers, who benefits most from subscribing, and what limitations or trade-offs readers should consider when choosing which digests merit a place in their inbox.

Data Elixir

Launched in 2014 by Lon Reisberg, this newsletter distinguishes itself through expert curation with minimal hype. It maintains strong editorial consistency and neutrality, presenting a handful of carefully selected articles that genuinely matter rather than overwhelming subscribers with dozens of links. The free version delivers this curated digest, whilst the Pro tier (fifty dollars annually) offers searchable archives spanning over 250 issues back to 2019, plus AI-powered learning tools including a SQL tutor and interview coach. The newsletter's defining characteristic is its quality-over-quantity approach, serving professionals who trust expert curation to surface what is genuinely important without the noise and hype that characterises many industry publications.

Data Science Weekly Newsletter

One of the oldest independent data science newsletters, having published over 400 issues since 2014, this publication sets itself apart through longevity and unwavering consistency. It delivers every Thursday without fail, maintaining a simple, distraction-free format with no over-commercialisation or fluff. Its unique value lies in this dependability, with subscribers knowing exactly what to expect each week, making it a practical baseline for staying current without surprises or dramatic shifts in editorial direction.

DataTalks.club

Unlike newsletters that simply curate external content, this publication builds its own ecosystem of learning resources, offering something fundamentally different through its open, community-driven approach. It combines free courses (Zoomcamps), events and a supportive Slack community, with all materials publicly available on GitHub. The newsletter keeps members informed about upcoming cohorts, webinars and talks within this collaborative environment. The defining feature is its entirely open and peer-supported approach, where readers gain access not just to information, but to hands-on learning opportunities and a community of practitioners willing to help each other grow.

KDnuggets

Founded in 1997 by Gregory Piatetsky-Shapiro, this publication stands apart through industry authority spanning nearly three decades. It holds unmatched credibility through its longevity and comprehensive coverage, known for its annual software polls, data science career resources and balanced mix of expert articles, surveys and tool trends that appeal equally to technical practitioners and managers seeking a global overview of the field. What sets it apart is this authoritative position, with few publications able to match its track record or breadth of influence across both technical and strategic aspects of data science and AI.

ODSC AI

Connected to the Open Data Science Conference network, this newsletter distinguishes itself as the gateway to the global data science event ecosystem. It serves as the practitioner's bridge to events, training, webinars and conferences worldwide. It covers the full stack, from tutorials and research to business use cases and career advice, but its distinctive strength lies in connecting readers to the broader data science community through live events and practical learning opportunities. The defining characteristic is this conference-linked, community-rich approach, proving especially valuable for professionals who want to remain active participants in the field rather than passive consumers of content.

Statology

Maintaining a unique position by focusing entirely on statistical foundations, Whilst most data science newsletters chase the latest AI developments, it maintains an unwavering focus on statistics and foundational analysis, providing step-by-step tutorials for Excel, R and Python that emphasise statistical intuition over trendy techniques. This singular focus on fundamentals makes it unique, serving as an essential complement to AI-focused newsletters and helping readers build the statistical knowledge base that underpins sound data science practice.

The AI Report

Created by the makers of KDnuggets, this digital newsletter and media platform carves out a distinctive niche with business-focused AI news for non-technical leaders. It curates AI developments specifically for executives and decision-makers, emphasising practical, non-technical insights about tools, regulations and market moves, backed by an AI tool database and a claimed community of over 400,000 subscribers. What sets it apart is this strategic, implementation-focused perspective, concentrating on what AI means for business strategy rather than explaining how AI works, making it accessible to leaders without deep technical backgrounds.

The Batch

Published weekly by DeepLearning.AI, co-founded by Andrew Ng, this newsletter offers trusted commentary that combines AI news with insightful analysis. Written by leading experts, it provides a balanced view that merges academic grounding with applied, real-world context. The distinguishing feature is this authoritative perspective on implications, helping engineers, product teams and business leaders understand why developments matter and how to think about their practical impact rather than simply reporting what happened.

 

  • The content, images, and materials on this website are protected by copyright law and may not be reproduced, distributed, transmitted, displayed, or published in any form without the prior written permission of the copyright holder. All trademarks, logos, and brand names mentioned on this website are the property of their respective owners. Unauthorised use or duplication of these materials may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties.

  • All comments on this website are moderated and should contribute meaningfully to the discussion. We welcome diverse viewpoints expressed respectfully, but reserve the right to remove any comments containing hate speech, profanity, personal attacks, spam, promotional content or other inappropriate material without notice. Please note that comment moderation may take up to 24 hours, and that repeatedly violating these guidelines may result in being banned from future participation.

  • By submitting a comment, you grant us the right to publish and edit it as needed, whilst retaining your ownership of the content. Your email address will never be published or shared, though it is required for moderation purposes.