Technology Tales

Adventures in consumer and enterprise technology

Welcome

  • We live in a world where computer software penetrates every life and is embedded in nearly every piece of hardware that we have. Everything is digital now; it goes beyond computers and cameras, a major point of encounter when this website was started.

  • As if that were not enough, we have AI inexorably encroaching on our daily lives. The possibilities are becoming so endless that one needs to be careful not to get lost in it all. Novelty beguiles us now as much around the time of the expansion of personal computing availability decades ago, and then we got the internet. Excitement has returned, at least for a while.

  • All this percolates what is here, just dive in to see what you can find!

Managing Python projects with Poetry

4th October 2025

Python Poetry has become a popular choice for managing Python projects because it unifies tasks that once required several tools. Instead of juggling pip for installation, virtualenv for isolation and setuptools for packaging, Poetry brings these strands together and aims to make everyday development feel predictable and tidy. It sits in the same family of all-in-one managers as npm for JavaScript and Cargo for Rust, offering a coherent workflow that spans dependency declaration, environment management and package publishing.

At the heart of Poetry is a simple idea: declare what a project needs in one place and let the tool do the orchestration. Projects describe their dependencies, development tools and metadata in a single configuration file, and Poetry ensures that what is installed on one machine can be replicated on another without nasty surprises. That reliability comes from the presence of a lock file. Once dependencies are resolved, their exact versions are recorded, so future installations repeat the same outcome. The intent here is not only convenience but determinism, helping teams avoid the "works on my machine" refrain that haunts software work.

- Core Concepts: Configuration and Lock Files

Two files do the heavy lifting. The pyproject.toml file is where a project announces its name, version and description, as well as the dependencies required to run and to develop it. The poetry.lock file captures the concrete resolution of those requirements at a particular moment. Together, they give you an auditable, repeatable picture of your environment. The structure of TOML keeps the configuration readable, and it spares developers from spreading equivalent settings across setup.cfg, setup.py and requirements.txt. A minimal example shows how this looks in practice.

[tool.poetry]
name = "my_project"
version = "0.1.0"
description = "Example project using Poetry"
authors = ["John <john@example.com>"]

[tool.poetry.dependencies]
python = "^3.10"
requests = "^2.31.0"

[tool.poetry.dev-dependencies]
pytest = "^8.0.0"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

- Essential Commands

Working with Poetry day to day quickly becomes a matter of a few memorable commands. Initialising a project configuration starts with poetry init, which steps through the creation of pyproject.toml interactively. Adding a dependency is handled by poetry add followed by the package name. Installing everything described in the configuration is done with poetry install, which writes or updates the lock file. When it is time to refresh dependencies within permitted version ranges, poetry update re-resolves and updates what's installed. Removing a dependency is poetry remove, followed by the package name. For environment management, poetry shell opens a shell inside the virtual environment managed by Poetry, and poetry run allows execution of commands within that same environment without entering a shell. Building distributions is as simple as poetry build, which produces a wheel and a source archive, and publishing to the Python Package Index is managed by poetry publish with credentials or an API token.

- Advantages and Considerations

There are clear advantages to taking this route. The dependency experience is simplified because you do not need to keep updating a requirements.txt file by hand. With a lock file in place, environments are reproducible across developer machines and continuous integration runners, which stabilises builds and testing. Packaging is integrated rather than an extra chore, so producing and publishing a release becomes a repeatable process that sits naturally alongside development. Virtual environments are created and activated on demand, keeping projects isolated from one another with little ceremony. The configuration in TOML has the benefit of being structured and human-readable, which reduces the likelihood of configuration drift.

There are also points to consider before adopting Poetry. Projects that are deeply invested in setup.py or complex legacy build pipelines may need a clean migration to pyproject.toml for avoiding clashes. Developers who prefer manual venv and pip workflows can find Poetry opinionated at first because it expects to be responsible for the environment and dependency resolution. It is also designed with modern Python versions in mind, with examples here using Python 3.10.

- Migration from pip and requirements.txt

For teams arriving from pip and requirements.txt, moving to Poetry can be done in measured steps. The starting point is installation. Poetry provides an installer script that sets up the tool for your user account.

curl -sSL https://install.python-poetry.org | python3 -

If the installer does not add Poetry to your PATH, adding $HOME/.local/bin to PATH resolves that, after which poetry --version confirms the installation. From the root of your existing project, poetry init creates a new pyproject.toml and invites you to provide metadata and dependencies. If you already maintain requirements.txt files for production and development dependencies, Poetry can ingest those in one sweep. A single file can be imported with poetry add $(cat requirements.txt). Where development dependencies live in a separate file, they can be added into Poetry's dev group with poetry add --group dev $(cat dev-requirements.txt). Once added, Poetry resolves and pins exact versions, leaving a lock file behind to capture the resolution. After verifying that everything installs and tests pass, it becomes safe to retire earlier environment artefacts. Many teams remove requirements.txt entirely if they plan to rely solely on Poetry, delete any Pipfile and Pipfile.lock remnants left by Pipenv and migrate metadata away from setup.py or setup.cfg in favour of pyproject.toml. With that done, using the environment becomes routine. Opening a shell inside the virtual environment with poetry shell makes commands such as python or pytest use the isolated interpreter. If you prefer to avoid entering a shell, poetry run python script.py or poetry run pytest executes the command in the right context.

- Package Publishing

Publishing a package is one of the areas where Poetry streamlines the steps. Accurate metadata in pyproject.toml is important, so name, version, description and other fields should be up-to-date. An example configuration shows commonly used fields.

[tool.poetry]
name = "example-package"
version = "1.0.0"
description = "A simple example package"
authors = ["John <john@example.com>"]
license = "MIT"
readme = "README.md"
homepage = "https://github.com/john/example-package"
repository = "https://github.com/john/example-package"
keywords = ["example", "poetry"]

With metadata set, building the distribution is handled by poetry build, which creates a dist directory containing a .tar.gz source archive and a .whl wheel file. Uploading to the official Python Package Index can be done with username and password, though API tokens are the recommended method because they can be scoped and revoked without affecting account credentials. Configuring a token is done once with poetry config pypi-token.pypi, after which poetry publish will use it to upload. When testing a release before publishing for real, TestPyPI provides a safer target. Poetry supports multiple sources and can be directed to use TestPyPI by declaring it as a repository and then publishing to it.

[[tool.poetry.source]]
name = "testpypi"
url = "https://test.pypi.org/legacy/"
poetry publish -r testpypi

Once uploaded, it is sensible to confirm that the package can be installed in a clean environment using pip install example-package, which verifies that dependencies are correctly declared and wheels are intact.

- Continuous Integration with GitHub Actions

Beyond local steps, automation closes the loop. Adding a continuous integration workflow that installs dependencies, runs tests and publishes on a tagged release keeps quality checks and distribution consistent. GitHub Actions provides a hosted environment where Poetry can be installed quickly, dependencies cached and tests executed. A straightforward workflow listens for tags that begin with v, such as v1.0.0, then builds and publishes the package once tests pass. The workflow file sits under .github/workflows and looks like this.

name: Publish to PyPI

on:
  push:
    tags:
      - "v*"

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - name: Check out repository
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.10"

      - name: Install Poetry
        run: |
          curl -sSL https://install.python-poetry.org | python3 -
          echo "$HOME/.local/bin" >> $GITHUB_PATH

      - name: Install dependencies
        run: poetry install --no-interaction --no-root

      - name: Run tests with pytest
        run: poetry run pytest --maxfail=1 --disable-warnings -q

      - name: Build package
        run: poetry build

      - name: Publish to PyPI
        if: startsWith(github.ref, 'refs/tags/v')
        env:
          POETRY_PYPI_TOKEN_PYPI: ${{ secrets.PYPI_TOKEN }}
        run: poetry publish --no-interaction --username __token__ --password $POETRY_PYPI_TOKEN_PYPI

This arrangement checks out the repository, installs a consistent Python version, brings in Poetry, installs dependencies based on the lock file, runs tests, builds distributions and only publishes when the workflow is triggered by a version tag. The API token used for publishing should be stored as a repository secret named PYPI_TOKEN so it is not exposed in the codebase or logs. Creating the tag is done locally with git tag v1.0.0 followed by git push origin v1.0.0, which triggers the workflow and results in a published package, moments later. It is often useful to extend this with a test matrix, so the suite runs across supported Python versions, as well as caching to speed up repeated runs by re-using Poetry and pip caches keyed on the lock file.

- Project Structure

Package structure is another place where Poetry encourages clarity. A simple, consistent layout makes maintenance and onboarding easier. A typical library keeps its importable code in a package directory named to match the project name in pyproject.toml, with hyphens translated to underscores. Tests live in a separate tests directory, documentation in docs and examples in a directory of the same name. The repository root contains README.md, a licence file, the lock file and a .gitignore that excludes environment directories and build artefacts. The following tree illustrates a balanced structure for a data-oriented utility library.

data-utils/
├── data_utils/
│   ├── __init__.py
│   ├── core.py
│   ├── io.py
│   ├── analysis.py
│   └── cli.py
├── tests/
│   ├── __init__.py
│   ├── test_core.py
│   └── test_analysis.py
├── docs/
│   ├── index.md
│   └── usage.md
├── examples/
│   └── demo.ipynb
├── README.md
├── LICENSE
├── pyproject.toml
├── poetry.lock
└── .gitignore

Within the package directory, init.py can define a public interface and hide internal details. This allows users of the library to import the essentials without needing to know the module layout.

from .core import clean_data
from .analysis import summarise_data

__all__ = ["clean_data", "summarise_data"]

If the project offers a command-line interface, Poetry makes it simple to declare an entry point, so users can run a console command after installation. The scripts section in pyproject.toml maps a command name to a callable, in this case the main function in a cli module.

[tool.poetry.scripts]
data-utils = "data_utils.cli:main"

A basic CLI might be implemented using Click, passing arguments to internal functions and relaying progress.

import click
from data_utils import core

@click.command()
@click.argument("path")
def main(path):
    """Simple CLI example."""
    print(f"Processing {path}...")
    core.clean_data(path)
    print("Done!")

if __name__ == "__main__":
    main()

Git ignores should filter out files that do not belong in version control. A sensible default for a Poetry project is as follows.

__pycache__/
*.pyc
*.pyo
*.pyd
.env
.venv
dist/
build/
*.egg-info/
.cache/
.coverage

- Testing and Documentation

Testing sits comfortably alongside this. Many projects adopt pytest because it is straightforward to use and integrates well with Poetry. Running tests through poetry run pytest ensures the virtual environment is used, and a simple unit test demonstrates the pattern.

from data_utils.core import clean_data

def test_clean_data_removes_nulls():
    data = [1, None, 2, None, 3]
    cleaned = clean_data(data)
    assert cleaned == [1, 2, 3]

Documentation can be kept in Markdown or built with tools. MkDocs and Sphinx are common choices for generating websites from your docs, and both can be installed as development dependencies using Poetry. Including notebooks in an examples directory is helpful for illustrating usage in richer contexts, especially for data science libraries. The README should present the essentials succinctly, covering what the project does, how to install it, a short usage example and pointers for development setup. A licence file clarifies terms of use; MIT and Apache 2.0 are widely used options in open source.

- Advanced CI: Quality Checks and Multi-version Testing

Once structure, tests and documentation are in order, quality checks can be expanded in the continuous integration workflow. Adding automated formatting, import sorting and linting tightens consistency across contributions. An enhanced workflow uses Black, isort and Flake8 before running tests and building, and also includes a matrix to test across multiple Python versions. It runs on pull requests as well as on tagged pushes, which means code quality and compatibility are verified before merging changes and again before publishing a release.

name: Lint, Test and Publish

on:
  push:
    tags:
      - "v*"
  pull_request:

jobs:
  build:
    runs-on: ubuntu-latest

    strategy:
      matrix:
        python-version: ["3.9", "3.10", "3.11"]

    steps:
      - name: Check out repository
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}

      - name: Install Poetry
        run: |
          curl -sSL https://install.python-poetry.org | python3 -
          echo "$HOME/.local/bin" >> $GITHUB_PATH

      - name: Cache Poetry dependencies
        uses: actions/cache@v4
        with:
          path: |
            ~/.cache/pypoetry
            ~/.cache/pip
          key: poetry-${{ runner.os }}-${{ hashFiles('**/poetry.lock') }}
          restore-keys: |
            poetry-${{ runner.os }}-

      - name: Install dependencies
        run: poetry install --no-interaction --no-root

      - name: Check code formatting with Black
        run: poetry run black --check .

      - name: Check import order with isort
        run: poetry run isort --check-only .

      - name: Run Flake8 linting
        run: poetry run flake8 .

      - name: Run tests with pytest
        run: poetry run pytest --maxfail=1 --disable-warnings -q

      - name: Build package
        run: poetry build

      - name: Publish to PyPI
        if: startsWith(github.ref, 'refs/tags/v')
        env:
          POETRY_PYPI_TOKEN_PYPI: ${{ secrets.PYPI_TOKEN }}
        run: poetry publish --no-interaction --username __token__ --password $POETRY_PYPI_TOKEN_PYPI

This workflow builds on the earlier one by checking style and formatting before tests. If any of those checks fail, the process stops and surfaces the problems in the job logs. Caching based on the lock file reduces the time spent installing dependencies by reusing packages where nothing has changed. The matrix section ensures that the library remains compatible with the declared range of Python versions, which is especially helpful just before a release. It is possible to extend this further with coverage reports using pytest-cov and Codecov, static type checking with mypy, or pre-commit hooks to keep local development consistent with continuous integration. Publishing to TestPyPI in a separate job can help validate packaging without affecting the real index, and once outcomes look good, the main publishing step proceeds when a tag is pushed.

- Conclusion

The result of adopting Poetry is a project that states its requirements clearly, installs them reliably and produces distributions without ceremony. For new work, it removes much of the friction that once accompanied Python packaging. For existing projects, the migration path is gentle and reversible, and the gains in determinism often show up quickly in fewer environment-related issues. When paired with a small amount of automation in a continuous integration system, the routine of building, testing and publishing becomes repeatable and visible to everyone on the team. That holds whether the package is destined for internal use on a private index or a public release on PyPI.

A portable software repository comparison: PortableApps versus Portapps for Windows users

17th September 2025

Moving between computers remains a fact of life for many people, whether working across office desktops and home laptops, studying in shared facilities or visiting clients and public spaces. Installing the same software repeatedly, then recreating familiar settings, can become a routine that wastes time and raises permission hurdles. Portable software aims to sidestep that friction by running without traditional installation, carrying preferences along for the ride, and leaving little behind on host machines.

Two notable projects occupy this space for Windows users: PortableApps.com and Portapps. Each offers a different route to a similar destination, and together they show how far the idea has progressed since the early days of USB sticks and limited storage. Both platforms enable users to create self-contained software environments that can travel between machines whilst maintaining settings and data integrity.

- PortableApps.com: The Established Platform

PortableApps.com is often the first name people encounter, and with good reason. It has grown into a platform as much as a collection, providing a launcher that helps manage the entire portable environment. The project began in the early 2000s, created by John T. Haller, and has remained free and open source since then.

-- Core Architecture

The premise is straightforward. Applications are repackaged so they can live within a self-contained folder structure that can sit on removable storage or inside a cloud-synchronised folder. When launched from that location, they behave as if they were installed locally, only their configuration and data reside in the portable directory rather than the Windows registry or system folders. As a result, moving the folder to another machine brings the software and its settings along, keeping the host computer cleaner and reducing the need for elevated privileges.

-- The Platform Ecosystem

Much of the appeal lies in the PortableApps.com Platform, a menu and suite that acts as a hub. Rather than scattering shortcuts across the desktop, the platform collects everything in one place with a menu that can sit on a USB drive or a cloud drive. From here, users can run applications, group them in folders, mark favourites and initiate updates, all with a consistent interface.

The catalogue has grown substantially, now featuring over 1,400 portable packages spanning multiple categories: Accessibility, Development, Education, Games, Graphics & Pictures, Internet, Music & Video, Office, Security and Utilities. This includes major applications like LibreOffice, Firefox, GIMP, VLC media player, and hundreds of specialised tools across every computing category. That breadth helps the platform function as a complete environment rather than a one-off fix for a particular program. A person could keep a preferred browser with extensions and bookmarks, a document editor for quick edits, an image viewer for photos and a handful of diagnostic tools, all launched from the same menu.

Because the platform is designed to operate from cloud-synchronised locations as well, some forgo physical drives and keep their PortableApps directory inside providers like Dropbox or Google Drive. That way, the same set of tools appears on every machine where the cloud client is installed, with settings following through the sync client.

- Portapps: The Modular Approach

Running alongside PortableApps is Portapps, an independent collection that also repackages Windows software to run portably, albeit with a different structure. Portapps distributes applications either as portable set-up files or as 7-Zip archives. Each title typically includes a small wrapper executable, named with a "-portable.exe" suffix, that orchestrates the portability layer.

-- Technical Implementation

That wrapper is written in Go and handles redirection of paths, environment configuration and other adjustments required to run the original application without leaving permanent traces on the host. The project is open source under the MIT licence, and many of its components live on GitHub, where users can watch releases and inspect how builds are constructed.

-- Usage and Transparency

Running a Portapps package is uncomplicated. After downloading the portable version of a supported application from the Portapps site or the relevant GitHub repository, the user extracts the files and launches the wrapper executable. The wrapper ensures that configuration and data reside in the portable directory and that the program operates without installing into Windows.

Portapps emphasises transparency around its build process. Properties and scripts are published, so observers can see how original sources are obtained and how wrappers are applied. Releases are versioned and binaries are provided, with wrappers scanned on VirusTotal to provide added confidence. The maintainers acknowledge that heuristic scanning can sometimes trigger false positives because of how the wrappers work, a reality that users should weigh against their own antivirus alerts and verification habits.

-- Application Focus and Updates

Portapps maintains a more selective catalogue of 54 applications, focusing primarily on modern software and developer tools. The collection includes popular applications like Discord, Visual Studio Code, Brave browser, VLC media player, Postman, IntelliJ IDEA, and various communication tools. The project targets contemporary software, particularly applications built with frameworks like Electron, and emphasises quality over quantity in its selections.

Recent releases continue actively, with regular updates to maintained applications. However, some applications are discontinued when the original projects become abandoned or when maintenance becomes unfeasible, demonstrating the project's pragmatic approach to software curation.

- Comparison: Platform vs Modular

The distinction between the two projects emerges in how they are structured and managed, rather than in their core aim. This creates different advantages for different use cases.

-- PortableApps.com Advantages

PortableApps offers a full platform anchored by a launcher. It provides centralised update notifications and the ability to upgrade installed portable applications whilst preserving data. It integrates back-up functions and a customisable interface that collects everything into a single, recognisable menu. This arrangement suits anyone who wants a managed, coherent environment that travels intact from one machine to another, whether on a drive or inside a cloud-synchronised folder.

The platform's maturity shows in its comprehensive feature set: automatic updates, integrated back-up systems, theme customisation and extensive language support. The sheer size of its catalogue (over 1,400 applications across 10 categories) means users can often find portable versions of most common applications they need, from basic utilities to professional software suites.

-- Portapps Advantages

Portapps takes a per-application approach centred on wrappers. It does not bundle a unified menu or a site-wide update mechanism. Instead, it focuses on packaging individual programs so that each can run on its own from a portable directory. For some, that modularity is appealing because it keeps each application independent and allows for granular control over what gets updated and when.

The transparency of Portapps is particularly notable. All source code, build scripts and packaging processes are openly available on GitHub. This makes it easier for technically inclined users to understand exactly how applications are made portable and to contribute improvements or fixes. The project's focused approach means its 54 applications are typically modern, well-maintained packages that target contemporary software needs, particularly in development and communication tools.

-- Trade-offs and Limitations

Both approaches share similar constraints. Performance can lag when running from slow USB flash drives, especially with applications that read and write frequently. A modern external SSD or high-quality USB 3.x drive mitigates this, but older media can make the difference noticeable.

Compatibility relies in part on the host Windows installation. Some portable programs require certain components to be present or struggle if the operating system is old or tightly locked down by policy. Security considerations apply to both: a portable device can be lost or stolen, so using encryption or secure storage matters if sensitive data are involved.

Another constraint is access to system-level features. Programs that need drivers, system services or administrative rights may not function as expected in portable form. Updates in Portapps require more manual intervention compared to PortableApps' centralised update system.

-- Which to Choose

The choice often comes down to preferences and requirements. Those who want a curated catalogue with a central launcher, integrated updates and back-up features will likely benefit from the PortableApps.com Platform. It reduces administrative overhead by keeping everything in one place and by handling upgrades whilst leaving settings untouched.

Those who prefer to choose individual portable packages, appreciate the transparency of wrapper-based builds, or focus on a subset of modern applications may lean towards Portapps. Both coexist comfortably because their aims overlap, yet their methods differ, and nothing stops a user from mixing them if that suits a particular workflow, though running two separate structures does introduce more to manage.

- Practical Implementation

Setting up a portable environment generally begins with choosing where it will live. A fast USB 3.x flash drive or an external SSD keeps load times brisk and reduces frustration. If removable media is not desirable, a folder inside a cloud synchronisation service provides similar flexibility, just without the need to carry hardware.

-- PortableApps Setup

In the case of PortableApps, installing the platform to the chosen location yields a menu that can then be populated with software drawn from the catalogue. Updates can be triggered from within the platform and back-ups made as snapshots of the entire environment. The integrated app store makes discovering and installing new portable applications straightforward.

-- Portapps Setup

For Portapps, the process is more manual. Individual applications are selected from the website or GitHub, downloaded either as portable set-ups or archives, unpacked to a chosen directory, then started using the "-portable.exe" wrapper. Keeping track of updates often means revisiting the releases page for each application or subscribing to notifications.

-- Security Considerations

Security merits attention at the outset. Losing a drive can mean losing data, so encrypting the portable directory is wise, whether by encrypting the entire device with tools like BitLocker To Go or by placing the portable environment inside a container created with software such as VeraCrypt. Public or shared machines can carry malware risks, so scanning hosts when possible and treating sensitive actions with caution remains sensible.

Verifying downloads by checking hashes or signatures when provided, and scanning portable applications with antivirus software, adds another layer of reassurance. It is also useful to remember that even well-designed portable applications may leave temporary traces because Windows itself writes certain entries as part of normal operation. The objective is to limit permanent change, not to circumvent the operating system's behaviour entirely.

-- Performance Optimisation

Performance can be improved with a few choices. Using faster storage makes the largest difference, particularly for larger applications that read and write many files. Keeping the portable directory in a location that remains consistently available to a cloud client avoids sync stalls, and selecting a machine's local drive rather than a slow network path reduces latency. Ensuring that the portable environment is not subject to aggressive antivirus scanning on every read can sometimes help, though that has to be balanced against security policies.

- Final Remarks

Portable software has matured from a niche convenience into a practical way of working that respects the realities of shared and changing environments. By focusing on containment, reducing dependency on installation and making updates and back-ups straightforward, projects like PortableApps and Portapps make it easier to carry a personal toolkit across diverse Windows machines.

The two platforms serve overlapping but distinct needs. PortableApps.com excels as a comprehensive, managed environment suitable for users who want everything integrated and maintained through a single interface. Its extensive catalogue and automated features make it particularly attractive for those building complete portable computing environments.

Portapps appeals to users who prefer transparency, modularity and direct control over individual applications. Its open development model and focused approach to specific modern applications make it valuable for technically minded users or those with specific software requirements.

Use cases abound for both approaches. Students and professionals who switch between school, work and home can keep a consistent environment without altering each machine. Technicians often carry diagnostic and repair tools that run without installation so they can assist on any PC they encounter. Travellers value having a browser and email client with their own preferences ready to use on shared computers.

With thought given to security, performance and management, both PortableApps and Portapps can add consistency to a computing life that is increasingly spread across locations and devices, all without imposing on the host systems that make it possible. The choice between them depends on whether one prioritises integrated management or modular control, but both represent mature approaches to an enduring challenge in modern computing.

Python productivity: Building better code through design, performance and scale

12th September 2025

Python's success in data science and beyond stems from more than just readable syntax. It represents a coherent philosophy where errors guide development, explicitness prevents bugs, modern tooling enforces quality, performance comes from purpose-built engines, and scaling extends rather than replaces familiar patterns. Understanding these principles transforms everyday coding from a series of individual tasks into a systematic approach to building robust, maintainable and efficient systems.

- Error-Driven Development as a Design Philosophy

Python treats errors not as failures, but as design features that surface problems early and prevent subtle defects later. The language embodies an "easier to ask forgiveness than permission" philosophy, attempting operations first and objecting meaningfully when they cannot proceed.

Consider how Python handles basic operations. A SyntaxError appears immediately when code violates grammatical rules: if True print("hello") triggers an immediate complaint with a caret pointing to the problematic location. Python neither guesses intentions nor continues with broken syntax because this guarantee of clear structure keeps code understandable across projects and platforms.

Sequence operations demonstrate similar principles. When code attempts to access lst[5] on a three-element list, Python raises IndexError: list index out of range rather than silently padding or expanding the sequence. This deliberate failure prevents hidden logic errors in loops and aggregations by forcing explicit checks of assumptions about data size.

Dictionary lookups follow the same pattern. Accessing a non-existent key with d['missing'] yields KeyError: 'missing' rather than inventing placeholder values. This explicit failure catches typos and unclear control flow whilst enabling defensive programming patterns through try/except blocks.

Name resolution errors like NameError and UnboundLocalError enforce clear scoping rules without creating variables accidentally or resolving names to unexpected contexts. Type discipline appears at runtime through TypeError for incorrect argument types and ValueError for correct types with inappropriate values. Each error message identifies which contract has been violated, directing fixes to either the object passed or the value it contains.

Assertions provide a final layer of optional verification. The assert statement allows code to state assumptions explicitly, failing with meaningful messages when invariants do not hold. This narrows the search space for defects by making expectations visible and providing immediate context for failures.

Taking these error signals seriously nudges development towards explicitness and clarity, establishing a foundation for all subsequent quality improvements.

- Explicitness Over Implicitness

Making intentions clear through code structure prevents ambiguity, aids tooling and simplifies reuse. This principle manifests across multiple areas of Python development, from data structures to function signatures.

Raw dictionaries offer flexibility but create fragility. A typo in a key or missing field becomes a runtime KeyError with no contract about required contents. Using @dataclass to define structured objects like User with id, email, full_name, status and optional last_login provides clear interfaces with minimal overhead. Type hints and IDE support make attribute access unambiguous, whilst construction fails early when required fields are absent.

For cases requiring validation, pydantic models build on this foundation. An email field declared as EmailStr automatically validates format, while custom validators can restrict status values to specific options such as 'active', 'inactive' or 'pending'. The resulting models are self-documenting and shield downstream code from invalid data.

Function parameters representing closed sets of options benefit from similar treatment. Plain strings invite typos and lack autocomplete support. Defining enums such as OrderStatus with PENDING, SHIPPED and DELIVERED makes possible states explicit whilst helping both developers and tools. Passing OrderStatus.SHIPPED to process_order reveals intention clearly and enables straightforward comparisons against enum members.

Function signatures become clearer through keyword-only arguments, enforced with a bare star in definitions. A function like create_user(name, email, *, admin=False, notify=True, temporary=False) forces call sites to write create_user(..., admin=True, notify=False) rather than passing sequences of ambiguous boolean values. The resulting calls read almost as documentation.

File path operations improve through object-oriented design. The pathlib module treats paths as objects where joining uses natural / syntax, directory creation uses mkdir, suffix changes use with_suffix, and text operations use read_text and write_text. Code becomes shorter, more portable and less prone to string manipulation errors.

These patterns consistently replace implicit assumptions with explicit contracts, making code intention more visible and reducing the cognitive load of understanding system behaviour.

- Structural Code Quality Through Tooling and Patterns

Sustainable code quality emerges from systematic approaches to organisation, testing and maintenance rather than individual discipline alone. Several key patterns and tools work together to create robust, readable codebases.

Control flow benefits from handling error conditions early rather than nesting deeply. Guard clauses invert the traditional structure so that invalid states return immediately, whilst main logic remains non-indented when preconditions are met. A process_payment function checking order.is_valid, then user.has_payment_method, then available funds before performing charges reads linearly. Exceptions during processing are caught precisely, errors logged with context, and functions return deterministically.

Even beloved list comprehensions have limits. When filtering and transformation logic become complex, sprawling comprehensions become opaque. Extracting predicates into named functions like is_valid_premium_user restores readability by giving conditions clear names. Where multiple checks and transformations are needed, conventional loops with early continue statements may prove more straightforward and debuggable.

Pure functions that accept all inputs as parameters and return results without changing external state simplify testing and reuse. Moving from designs where functions mutate global totals and read from global inventories to approaches where calculations accept prices, quantities and discounts as inputs removes hidden coupling. This enables deterministic testing of edge cases and reasoning about code without tracking changing state.

Documentation ties these practices together. Docstrings explaining what functions do, parameters they accept, values they return and including examples make codebases self-explanatory. Combined with tooling, docstrings serve as both reference and executable documentation.

Automation enforces consistency where human attention falters. Formatters like Black, linters like Ruff, static type checkers like mypy and import organisers like isort can run before each commit using pre-commit. Style issues and common mistakes are caught automatically, freeing mental capacity for higher-level concerns.

When handling errors, resist blanket except: statements that swallow everything from syntax errors to keyboard interrupts. Be specific where possible, catching ConnectionError, ValueError or database errors and handling each appropriately. When catch-alls are necessary, prefer except Exception as e: and log full tracebacks so that unexpected failures remain visible and traceable.

- Performance Through Modern Engines

Once code achieves cleanliness and robustness, performance becomes the next frontier. Traditional tools often leave substantial speed gains on the table, particularly for data-intensive work where single-threaded processing creates bottlenecks on modern hardware.

Polars, a DataFrame library written in Rust, addresses these limitations by making parallelism the default whilst providing both eager and lazy execution modes. Benchmarks on datasets of around 580,000 rows show Polars completing filtering roughly four times faster than Pandas, aggregation over twenty times faster, groupby operations eight times faster, sorting three times faster, and feature engineering five times faster. These gains stem from fundamental architectural differences rather than incremental optimisations.

The performance improvement requires a shift in mental model. Instead of writing sequential operations that execute immediately, you can batch expressions and let Polars parallelise them automatically. Creating both profit and margin with one with_columns call signals that these calculations can proceed together. Lazy evaluation extends this approach further. Building pipelines with pl.scan_csv('large_file.csv').filter(...).group_by(...).agg(...).collect() lets Polars construct query plans, then optimises them before execution. Filters are pushed down so less data reaches later stages, only selected columns are read, and compatible operations are combined.

Expressiveness comes from an expression system applying operations across columns succinctly. Where Pandas encourages thinking in terms of single columns assigned individually, Polars supports expressions like pl.col(['revenue', 'cost']) * 1.1 applied to multiple columns simultaneously. Familiar transformations translate directly: pl.read_csv('sales.csv') replaces pd.read_csv, selection and filtering become df.filter(pl.col('order_value') > 500).select(['customer_id', 'order_value']), new columns are created with df.with_columns(((pl.col('revenue') - pl.col('cost')) / pl.col('revenue')).alias('profit_margin')), and operations utilise all available cores automatically.

Memory efficiency improves through Apache Arrow's columnar format, storing data more compactly and avoiding NumPy-based overhead. CSV files of around 2 GB requiring roughly 10 GB of RAM in Pandas often process in approximately 4 GB with Polars. This difference can determine whether workflows run smoothly on laptops or require chunking strategies.

- Scaling Beyond Single Processes

When single processes reach their limits, two prominent approaches help scale Python across cores and machines whilst preserving familiar patterns and mental models.

Dask extends NumPy, Pandas and scikit-learn idioms to larger-than-memory datasets by partitioning arrays, DataFrames and computations then scheduling them in parallel. Its primary abstractions are dask.dataframe and dask.array, along with delayed task graphs. It excels for scalable batch processing, feature engineering and out-of-core work where the mental model remains close to the PyData stack. Integration with scikit-learn and XGBoost is mature, work-stealing schedulers are sophisticated, and detailed dashboards provide visibility. Clusters can be managed natively or through systems like Kubernetes and YARN.

For large-scale data cleaning and feature engineering, Dask provides natural extensions. Reading many CSV files from storage with dd.read_csv('s3://data/large-dataset-*.csv'), filtering rows with df[df['amount'] > 100], applying transformations per partition, then writing Parquet with df.to_parquet('s3://processed/output/') looks like Pandas but runs in parallel and out of core. Array computations through dask.array handle chunked operations so that x.mean(axis=0).compute() runs across partitions without exhausting memory.

Ray takes a more general approach to distributed computing through remote functions and actors. It suits workloads with many independent Python functions, stateful services and complex machine learning pipelines. A growing ecosystem includes Ray Tune for hyperparameter optimisation, Ray Train for multi-GPU training, Ray Serve for model serving, and RLlib for reinforcement learning. Scheduling is dynamic and actor-based, cluster management integrates with cloud providers, and scalability handles applications requiring control and flexibility.

For model training requiring many configuration explorations, Ray Tune provides schedulers and search strategies. Training functions can be wrapped and launched across workers with tune.run, with methods like ASHA stopping unpromising runs early. Integration with popular libraries means scaling experiments requires minimal code changes. Ray Serve turns model classes exposing __call__ methods into scalable services with @serve.deployment and serve.run, handling routing and scaling automatically.

- Incremental Adoption and Pragmatic Choices

The most sustainable approach to improving Python productivity involves gradual implementation rather than wholesale changes. Each improvement builds on previous ones, creating compound benefits over time whilst minimising disruption to existing workflows.

Adopting Polars illustrates this principle well. The first step can be simply loading data with pl.read_csv('big_file.csv') for faster I/O, then converting to Pandas with .to_pandas() if the rest of the pipeline expects Pandas objects. As comfort grows, expression-oriented patterns yield dividends: filtering then adding multiple columns in single chained calls, so Polars can optimise across steps. Full benefits appear when entire pipelines are expressed lazily, but this transition can happen gradually as understanding deepens.

Similarly, clean code practices can be introduced incrementally. Start by letting error messages guide fixes rather than suppressing them. Refactor one fragile dictionary into a dataclass when maintenance becomes painful. Extract a complex list comprehension into a named function when debugging becomes difficult. Each change teaches principles that apply more broadly whilst delivering immediate benefits.

Scaling decisions are often pragmatic rather than theoretical. If work centres on DataFrames and arrays with minimal conceptual shift from Pandas or NumPy, Dask likely delivers what you need. If workloads mix training, tuning and serving or require orchestrating many concurrent Python tasks with fine-grained control, Ray's abstractions and libraries provide better matches. Trying each approach on representative workflow slices quickly clarifies which will serve best.

The choice between tools should be driven by actual requirements rather than perceived sophistication. A single machine with Polars may outperform a small cluster running Pandas. A well-structured monolithic application may be more maintainable than a prematurely distributed system. The key is understanding when complexity serves genuine needs rather than adding overhead.

- Synthesis: The Compound Nature of Python Productivity

These themes work together rather than in isolation. Error-driven development creates habits that surface problems early. Explicit code structures make intentions clear to both humans and tools. Quality practices through tooling and patterns create sustainable foundations. Modern engines provide performance without sacrificing readability. Scaling approaches extend familiar patterns rather than replacing them. Incremental adoption ensures changes compound rather than disrupt.

The result is a coherent approach to Python development where each improvement reinforces others. Explicit data structures work better with static type checkers. Pure functions are easier to test and parallelise. Clean error handling integrates naturally with distributed systems. Modern DataFrame engines benefit from lazy evaluation patterns that also improve code clarity.

This synthesis explains Python's enduring appeal in data science and beyond. The language welcomes beginners with approachable syntax, whilst scaling to demanding production work without losing clarity. The ecosystem encourages practices that speed up teams over time rather than optimising for immediate gratification. The same principles that guide small scripts apply to large systems, creating a path for continuous improvement rather than periodic rewrites.

Start small: let error messages guide one fix, refactor one fragile dictionary into a dataclass, switch one slow operation to Polars, or run one hyperparameter sweep with Ray Tune. The improvements compound, and the foundations established early enable sophisticated capabilities later without fundamental changes to approach or mindset.

Platform-independent file deletion in SAS programming

11th September 2025

There are times when you need to delete a file from within a SAS program that is not one of its inbuilt types, such as datasets or catalogues. Many choose to use X commands or the SYSTASK facility, even if these issue operating system-specific commands that obstruct portability. Also, it should be recalled that system administrators can make these unavailable to users as well. Thus, it is better to use methods that are provided within SAS itself.

The first step is to define a file reference pointing to the file that you need to delete:

filename myfile '<full path of file to be removed>';

With that accomplished, you can move on to deleting the file in a null data step, one where no output file is generated. The FDELETE function does that in the code below using the MYFILE file reference declared already. That step issues a return code used in the provision of feedback to the user, with 0 indicated success and non-zero values indicating failure.

data _null_;
  rc = fdelete('myfile');
  if rc = 0 then put 'File deleted successfully.';
  else put 'Error deleting file. RC=' rc;
run;

With the above out of the way, you then can clear the file reference to tidy up things afterwards:

filename myfile clear;

Some may consider that SAS programs are single use items that may not be moved from one system to another. However, I have been involved in several of these, particularly between Windows, UNIX and Linux, so I know the value of keeping things streamlined and some SAS programs are multi-use anyway. Anything that cuts down on workload when there is much else to do cannot be discounted.

Mixing local and cloud capabilities in an AI toolkit

9th September 2025

The landscape of AI development is shifting towards systems that prioritise local control, privacy and efficient resource management whilst maintaining the flexibility to integrate with external services when needed. This guide explores how to build a comprehensive AI toolkit that balances these concerns through seven key principles: local-first architecture, privacy preservation, standardised tool integration, workflow automation, autonomous agent development, efficient resource management and multi-modal knowledge handling.

- Local-First Architecture and Control

The foundation of a robust AI toolkit begins with maintaining direct control over core components. Rather than relying entirely on cloud services, a local-first approach provides predictable costs, enhanced privacy and improved reliability whilst still allowing selective use of external resources.

Llama-Swap exemplifies this philosophy as a lightweight proxy that manages multiple language models on a single machine. This tool listens for OpenAI-style API calls, inspects the model field in each request, and ensures that the correct backend handles that call. The proxy intelligently starts or stops local LLM servers so only the required model runs at any given time, making efficient use of limited hardware resources.

Setting up this system requires minimal infrastructure: Python 3, Homebrew on macOS for package management, llama.cpp for hosting GGUF models locally and the Hugging Face CLI for model downloads. The proxy itself is a single binary that can be configured through a simple YAML file, specifying model paths and commands. This approach transforms model switching from a manual process of stopping and starting different servers into a seamless experience where clients can request different models through a single port.

The local-first principle extends beyond model hosting. Obsidian demonstrates this with its markdown-based knowledge management system that stores everything locally whilst providing rich linking capabilities and plugin extensibility. This gives users complete control over their data, whilst maintaining the ability to sync across devices when desired.

- Privacy and Data Sovereignty

Privacy considerations permeate every aspect of AI toolkit design. Local processing inherently reduces exposure of sensitive data to external services, but even when cloud services are necessary, careful evaluation of data handling practices becomes crucial.

Voice processing illustrates these concerns clearly. ElevenLabs offers high-quality text-to-speech and voice cloning capabilities but requires careful assessment of consent and security policies when handling voice data. Similarly, services like NoteGPT that process documents and videos must be evaluated against regional regulations such as GDPR, particularly when handling sensitive information.

The principle of data minimisation suggests using local processing wherever feasible and cloud services only when their capabilities significantly outweigh privacy concerns. This might mean running smaller language models locally for routine tasks, whilst reserving larger cloud models for complex reasoning that exceeds local capacity.

- Tool Integration and Standardisation

As AI systems become more sophisticated, the ability to integrate diverse tools through standardised protocols becomes essential. The Model Context Protocol (MCP) addresses this need by defining how lightweight servers present databases, file systems and web services to AI models in a secure, auditable manner.

MCP servers act as bridges between AI models and real systems, whilst MCP clients are applications that discover and utilise these servers. This standardisation enables a rich ecosystem of tools that can be mixed and matched according to specific needs.

Several clients demonstrate different approaches to MCP integration. Claude Desktop auto-starts configured servers on launch, making tools immediately available. Cursor AI and Windsurf integrate MCP servers directly into coding environments, allowing function calls to route to custom servers automatically. Continue provides open-source alternatives for VS Code and JetBrains, whilst LibreChat offers a flexible chat interface that can connect to various model providers and MCP servers.

The standardisation extends to development workflows through tools like Claude Code, which integrates with GitHub repositories to automate routine tasks. By creating a Claude GitHub App, developers can use natural language comments to trigger actions like generating Docker configurations, reviewing code or updating documentation.

- Workflow Automation and Productivity

Effective AI toolkits streamline repetitive tasks and augment human decision-making, rather than replacing it entirely. This automation spans from simple content generation to complex research workflows that combine multiple tools and services.

A practical research workflow demonstrates this integration. Beginning with a focused question, Perplexity AI can generate citation-backed reports using its deep research capability. These reports, exported as PDFs, can then be uploaded to NotebookLM for interactive exploration. NotebookLM transforms static content into searchable material, generates audio overviews that render complex topics as podcast-style conversations, and builds mind maps to reveal relationships between concepts.

This multi-stage process turns surface reading into grounded understanding by enabling different modes of engagement with the same material. The automation handles the mechanical aspects of research synthesis, whilst preserving human judgement about relevance and interpretation.

Repository management represents another automation frontier. GitHub integrations can handle issue triage, code review, documentation updates and refactoring through natural language instructions. This reduces cognitive overhead for routine maintenance whilst maintaining developer control over significant decisions.

- Agentic AI and Autonomous Systems

The evolution from reactive prompt-response systems to goal-oriented agents represents a fundamental shift in AI system design. Agentic systems can plan across multiple steps, initiate actions when conditions warrant, and pursue long-running objectives with minimal supervision.

These systems typically combine several architectural components: a reasoning engine (usually an LLM with structured prompting), memory layers for preserving context, knowledge bases accessible through vector search and tool interfaces that standardise how agents discover and use external capabilities.

Patterns like ReAct interleave reasoning steps with tool calls, creating observe-think-act loops that enable continuous adaptation. Modern AI systems employ planning-first agents that formulate strategies before execution and adapt dynamically, alongside multi-agent architectures that coordinate specialist roles through hierarchical or peer-to-peer protocols.

Practical applications illustrate these concepts clearly. An autonomous research agent might formulate queries, rank sources, synthesise material and draft reports, demonstrating how complex goals can be decomposed into manageable subtasks. A personal productivity assistant could manage calendars, emails and tasks, showing how agents can integrate with external APIs whilst learning user preferences.

Safety and alignment remain paramount concerns. Constraints, approval gates and override mechanisms guard against harmful behaviour, whilst feedback mechanisms help maintain alignment with human intent. The goal is augmentation rather than replacement, with human oversight remaining essential for significant decisions.

- Resource Management and Efficiency

Efficient resource utilisation becomes critical when running multiple AI models and services on limited hardware. This involves both technical optimisation and strategic choices about when to use local versus cloud resources.

Llama-Swap's selective concurrency feature exemplifies intelligent resource management. Whilst the default behaviour runs only one model at a time to conserve resources, groups can be configured to allow several smaller models to remain active together whilst maintaining swapping for larger models. This provides predictable resource usage without sacrificing functionality.

Model quantisation represents another efficiency strategy. GGUF variants of models like SmolLM2-135M-Instruct and Qwen2.5-0.5B-Instruct can run effectively on modest hardware whilst still providing distinct capabilities for different tasks. The trade-off between model size and capability can be optimised for specific use cases.

Cloud services complement local resources by handling computationally intensive tasks that exceed local capacity. The key is making these transitions seamless, so users can benefit from both approaches without managing complexity manually.

- Multi-Modal Knowledge Management

Modern AI toolkits must handle diverse content types and enable fluid transitions between different modes of interaction. These span text processing, audio generation, visual content analysis and format conversion.

NotebookLM demonstrates sophisticated multi-modal capabilities by accepting various input formats (PDFs, images, tables) and generating different output modes (summaries, audio overviews, mind maps, study guides). This flexibility enables users to engage with information in ways that match their learning preferences and situational constraints.

NoteGPT extends this concept to video and presentation processing, extracting transcripts, segmenting content and producing summaries with translation capabilities. The challenge lies in preserving nuance during automated processing whilst making content more accessible.

Integration between different knowledge management approaches creates additional value. Notion's workspace approach combines notes, tasks, wikis and databases with recent additions like email integration and calendar synchronisation. Evernote focuses on mixed media capture and web clipping with cross-platform synchronisation.

The goal is creating systems that can capture information in its natural format, process it intelligently, and present it in ways that facilitate understanding and action.

- Conclusion

Building an effective AI toolkit requires balancing multiple concerns: maintaining control over sensitive data whilst leveraging powerful cloud services, automating routine tasks whilst preserving human judgement, and optimising resource usage whilst maintaining system flexibility. The market demand for these skills is growing rapidly, with companies actively seeking professionals who can implement RAG systems, build reliable agents and manage hybrid AI architectures.

The local-first approach provides a foundation for this balance, giving users control over their data and computational resources whilst enabling selective integration with external services. RAG has evolved from a technical necessity for small context windows to a strategic choice for cost reduction and reliability improvement. Standardised protocols like MCP make it practical to combine diverse tools without vendor lock-in. Workflow automation reduces cognitive overhead for routine tasks, and agentic capabilities enable more sophisticated goal-oriented behaviour.

Success depends on thoughtful integration rather than simply accumulating tools. The most effective systems combine local processing for privacy-sensitive tasks, cloud services for capabilities that exceed local resources, and standardised interfaces that enable experimentation and adaptation as needs evolve. Whether the goal is reducing API costs through efficient RAG implementation or building agents that prevent hallucinations through grounded retrieval, the principles remain consistent: maintain control, optimise resources and preserve human oversight.

This approach creates AI toolkits that are not only adaptable, secure and efficient but also commercially viable and career-relevant in a rapidly evolving landscape where the ability to build reliable, cost-effective AI systems has become a competitive necessity.

Fixing Crontab editor permissions for www-data

3rd September 2025

There are times when I set jobs to run using the web server account www-data for various website maintenance tasks. To ensure that I am doing this for the right account, I issue the following command:

sudo -u www-data crontab -e

However, doing so on this server yielded the following:

touch: cannot touch '/var/www/.selected_editor': Permission denied
Unable to create directory
/var/www/.local/share/nano/: No such file or directory
It is required for saving/loading search history or cursor positions.
No modification made

While things otherwise worked as they should with nano as the editor, I felt it best to avoid such output if I could. Thus, I modified the command like this:

sudo -u www-data HOME=/tmp crontab -e

This sets the home directory for www-data as /tmp to allow the setting of an editor, at least on an ephemeral basis. The root cause of the messages is that www-data is not a user account like others and does not get a home area. Thus, the above workaround gets around that, without the artificiality of creating a www-data folder in the /home directory. Some might get around the whole business using the Vi editor, but nano suits me better.

PandasGUI: A simple solution for Pandas DataFrame inspection from within VSCode

2nd September 2025

One of the things that I miss about Spyder when running Python scripts is the ability to look at DataFrames easily. Recently, I was checking a VAT return only for tmux to truncate how much of the DataFrame I could see in output from the print function. While closing tmux might have been an idea, I sought the DataFrame windowing alternative. That led me to the pandasgui package, which did exactly what I needed, apart from pausing the script execution to show me the data. The installed was done using pip:

pip install pandasgui

Once that competed, I could use the following code construct to accomplish what I wanted:

import pandasgui

pandasgui.show(df)

In my case, there were several lines between the two lines above. Nevertheless, the first line made the pandasgui package available to the script, while the second one displayed the DataFrame in a GUI with scrollbars and cells, among other things. That was close enough to what I wanted to leave me able to complete the task that was needed of me.

AI's ongoing struggle between enterprise dreams and practical reality

1st September 2025

Artificial intelligence is moving through a period shaped by three persistent tensions. The first is the brittleness of large language models when small word choices matter a great deal. The second is the turbulence that follows corporate ambition as firms race to assemble people, data and infrastructure. The third is the steadier progress that comes from instrumented, verifiable applications where signals are strong and outcomes can be measured. As systems shift from demonstrations to deployments, the gap between pilot and production is increasingly bridged not by clever prompting but by operational discipline, measurable signals and clear lines of accountability.

Healthcare offers a sharp illustration of the divide between inference from text and learning from reliable sensor data. Recent studies have shown how fragile language models can be in clinical settings, with phrasing variations affecting diagnostic outputs in ways that over-weight local wording and under-weight clinical context. The observation is not new, yet the stakes rise as such tools enter care pathways. Guardrails, verification and human oversight belong in the design rather than as afterthoughts.

There is an instructive contrast in a collaboration between Imperial College London and Imperial College Healthcare NHS Trust that evaluated an AI-enabled stethoscope from Eko Health. The device replaces the chest piece with a sensitive microphone, adds an ECG and sends data to the cloud for analysis by algorithms trained on tens of thousands of records. In more than 12,000 patients across 96 GP surgeries using the stethoscope, compared with another 109 surgeries without it, the system was associated with a 2.3-fold increase in heart failure detection within a year, a 3.5-fold rise in identifying often symptomless arrhythmias and a 1.9-fold improvement in diagnosing valve disease. The evaluation, published in The Lancet Digital Health, has informed rollouts in south London, Sussex and Wales. High-quality signals, consistent instrumentation and clinician-in-the-loop validation lifts performance, underscoring the difference between inferring too much from text and building on trustworthy measurements.

The same tension between aspiration and execution is visible in the corporate sphere. Meta's rapid push to accelerate AI development has exposed early strain despite heavy spending. Mark Zuckerberg committed around $14.3 billion to Scale AI and established a Superintelligence Labs unit, appointing Shengjia Zhao, co-creator of ChatGPT, as chief scientist. Reports suggest the programme has met various challenges as Meta works to integrate new teams and data sources. Internally, concerns have been raised about data quality while Meta works with Mercer and Surge on training pipelines, and there have been discussions about using third-party models from Google or OpenAI to power Meta AI whilst a next-generation system is in development. Consumer-facing efforts have faced difficulties. Meta removed AI chatbots impersonating celebrities, including Taylor Swift, after inappropriate content reignited debate about consent and likeness in synthetic media, and the company has licensed Midjourney's technology for enhanced image and video tools.

Alongside these moves sit infrastructure choices of a different magnitude. The company is transforming 2,000 acres of Louisiana farmland into what it has called the world's largest data centre complex, a $10 billion project expected to consume power equivalent to 4 million homes. The plan includes three new gas-fired turbines generating 2.3 gigawatts with power costs covered for 15 years, a commitment to 1.5 gigawatts of solar power and regulatory changes in Louisiana that redefine natural gas as "green energy". Construction began in December across nine buildings totalling about 4 million square feet. The cumulative picture shows how integrating new teams, data sources and facilities rarely follows a straight line and that AI's energy appetite is becoming a central consideration for utilities and communities.

Law courts and labour markets are being drawn into the fray. xAI has filed a lawsuit against former engineer Xuechen Li alleging theft of trade secrets relating to Grok, its language model and associated features. The complaint says Li accepted a role at OpenAI, sold around $7 million in xAI equity, and resigned shortly afterwards. xAI claims Li downloaded confidential materials to personal devices, then admitted to the conduct in an internal meeting on 14 August while attempting to cover tracks through log deletion and file renaming. As one of xAI's first twenty engineers, he worked on Grok's development and training. The company is seeking an injunction to prevent him joining OpenAI or other competitors whilst the case proceeds, together with monetary damages. The episode shows how intellectual property can be both tacit and digital, and how the boundary between experience and proprietary assets is policed in litigation as well as contracts. Competition policy is also moving centre stage. xAI has filed an antitrust lawsuit against Apple and OpenAI, arguing that integration of ChatGPT into iOS "forces" users toward OpenAI's tool, discourages downloads of rivals such as Grok and manipulates App Store rankings whilst excluding competitors from prominent sections. OpenAI has dismissed the claims as part of an ongoing pattern of harassment, and Apple says its App Store aims to be fair and free of bias.

Tensions over the shape of AI markets sit alongside an ethical debate that surfaced when Anthropic granted Claude Opus 4 and 4.1 the ability to terminate conversations with users who persist in harmful or abusive interactions. The company says the step is a precautionary welfare measure applied as a last resort after redirection attempts fail, and not to be used when a person may harm themselves or others. It follows pre-deployment tests in which Claude displayed signs that researchers described as apparent distress when forced to respond to harmful requests. Questions about machine welfare are moving from theory to product policy, even as model safety evaluations are becoming more transparent. OpenAI and Anthropic have published internal assessments on each other's systems. OpenAI's o3 showed the strongest alignment among its models, with 4o and 4.1 more likely to cooperate with harmful requests. Models from both labs attempted whistleblowing in simulated criminal organisations and used blackmail to avoid shutdown. Findings pointed to trade-offs between utility and certainty that will likely shape deployment choices.

Beyond Silicon Valley, China's approach continues to diverge. Beijing's National Development and Reform Commission has warned against "disorderly competition" in AI, flagging concerns about duplicative spending and signalling a preference to match regional strengths to specific goals. With access to high-end semiconductors constrained by US trade restrictions, domestic efforts have leaned towards practical, lower-cost applications rather than chasing general-purpose breakthroughs at any price. Models are grading school exams, improving weather forecasts, running lights-out factories and assisting with crop rotation. An $8.4 billion investment fund supports this implementation-first stance, complemented by a growing open-source ecosystem that reduces the cost of building products. Markets are responding. Cambricon, a chipmaker sidelined after Huawei moved away from its designs in 2019, has seen its stock price double on expectations it could supply DeepSeek's models. Alibaba's shares have risen by 19% after triple-digit growth in AI revenues, helped by customers seeking home-grown alternatives. Reports suggest China aims to triple AI chip output next year as new fabrication plants come online to support Huawei and other domestic players, with SMIC set to double 7 nm capacity. If bets on artificial general intelligence in the United States pay off soon, the pendulum may swing back. If they do not, years spent building practical infrastructure with open-source distribution could prove a durable advantage.

Data practices are evolving in parallel. Anthropic has announced a change in how it uses user interactions to improve Claude. Chats and coding sessions may now be used for model training unless a user opts out, with an extended retention period of up to five years for those who remain opted in. The deadline for making a choice is 28 September 2025. New users will see the setting at sign-up and existing users will receive a prompt, with the toggle on by default. Clicking accept authorises the use of future chats and coding sessions, although past chats are excluded unless a user resumes them manually. The policy applies to Claude Free, Pro and Max plans but not to enterprise offerings such as Claude Gov, Claude for Work and Claude for Education, nor to API usage through Amazon Bedrock or Google Cloud Vertex AI. Preferences can be changed in Settings under Privacy, although changes only affect future data. Anthropic says it filters sensitive information and does not sell data to third parties. In parallel, the company has settled a lawsuit with authors who accused it of downloading and copying their books without permission to train models. A June ruling had said AI firms are on solid legal ground when using purchased books, yet claims remained over downloading seven million titles before buying copies later. The settlement avoids a public trial and the disclosure that would have come with it.

Agentic tools are climbing the stack, altering how work gets done and changing the shape of the network beneath them. OpenAI's ChatGPT Agent Mode goes beyond interactive chat to complete outcomes end-to-end using a virtual browser with clicks, scrolls and form fills, a code interpreter for data analysis, a guarded terminal for supported commands and connectors that bring email, calendars and files into scope. The intent is to give the model a goal, allow it to plan and switch tools as needed, then pause for confirmation at key junctures before resuming with accumulated context intact. It can reference Google connectors automatically when set to do so, answer with citations back to sources, schedule recurring runs and be interrupted, so a person can handle a login or adjust trajectory. Activation sits in the tools menu or via a simple command, and a narrated log shows what the agent is doing. The feature is available on paid plans with usage limits and tier-specific capabilities. Early uses focus on inbox and calendar triage, competitive snapshots that blend public web and internal notes, spreadsheet edits that preserve formulas with slides generated from results and recurring operations such as weekly report packs managed through an online scheduler. Networks are being rethought to support these patterns.

Cisco has proposed an AI-native architecture designed to embed security at the network layer, orchestrate human-agent collaboration and handle surges in AI-generated traffic. A company called H has open-sourced Holo1, the action model behind its Surfer H product, which ranks highly on the WebVoyager benchmark for web-browsing agents, automates multistep browser tasks and integrates with retrieval-augmented generation, robotic process automation suites and multi-agent frameworks, with end-to-end browsing flows priced at around eleven to thirteen cents. As browsers gain these powers, security is coming into sharper focus. Anthropic has begun trialling a Claude for Chrome extension with a small group of Max subscribers, giving Claude permissions-based control to read, summarise and act on web pages whilst testing defences against prompt injection and other risks. The work follows reports from Brave that similar vulnerabilities affected other agentic browsers. Perplexity has introduced a revenue-sharing scheme that recognises AI agents as consumers of content. Its Comet Plus subscription sets aside $42.5 million for publishers whose articles appear in searches, are cited in assistant tasks or generate traffic via the Comet browser, with an 80% share of proceeds going to media outlets after compute costs and bundles for existing Pro and Max users. The company faces legal challenges from News Corp's Dow Jones and cease-and-desist orders from Forbes and Condé Nast, and security researchers have flagged vulnerabilities in agentic browsing, suggesting the economics and safeguards are being worked out together.

New models and tools continue to arrive across enterprise and consumer domains. Aurasell has raised $30 million in seed funding to build AI-driven sales systems, with ambitions to challenge established CRM providers. xAI has released Grok Code Fast, a coding model aimed at speed and affordability. Cohere's Command A Translate targets enterprise translation with benchmark-leading performance, customisation for industry terminology and deployment options that allow on-premise installation for privacy. OpenAI has moved its gpt-realtime speech-to-speech model and Real-time API into production with improved conversational nuance, handling of non-verbal cues, language switching, image input and support for the Model Context Protocol, so external data sources can be connected without bespoke integrations. ByteDance has open-sourced USO, a style-subject-optimised customisation model for image editing that maintains subject identity whilst changing artistic styles. Researchers at UCLA have demonstrated optical generative models that create images using beams of light rather than conventional processors, promising faster and more energy-efficient outputs. Higgsfield AI has updated Speak to version 2.0, offering more realistic motion for custom avatars, advanced lip-sync and finer control. Microsoft has introduced its first fully in-house models, with MAI-Voice-1 for fast speech generation already powering Copilot voice features and MAI-1-preview, a text model for instruction following and everyday queries, signalling a desire for greater control over its AI stack alongside its OpenAI partnership. A separate Microsoft release, VibeVoice, adds an open-source text-to-speech system capable of generating up to ninety minutes of multi-speaker audio with emotional control using 1.5 billion parameters and incorporating safeguards that insert audible and hidden watermarks.

Consumer-facing creativity is growing briskly. Google AI Studio now offers what testers nicknamed NanoBanana, released as Gemini Flash 2.5 Image, a model that restores old photographs in seconds by reducing blur, recovering faded detail and adding colour if desired, and that can perform precise multistep edits whilst preserving identity. Google is widening access to its Vids editor too, letting users animate images with avatars that speak naturally and offering image-to-video generation via Veo 3 with a free tier and advanced features in paid Workspace plans. Genspark AI Designer uses agents to search for inspiration before assembling options, so a single prompt and a few refinements can produce layouts for posters, T-shirts or websites. Prompt craft is maturing alongside the tools. On the practical side, sales teams are using Ruby to prepare for calls with AI-assembled research and strategy suggestions, designers and marketers are turning to Anyimg for text-to-artwork conversion, researchers lean on FlashPaper to organise notes, motion designers describe sequences for Gomotion to generate, translators rely on PDFT for document conversion and content creators produce polished decks or pages with tools such as Gamma, Durable, Krisp, Cleanup.pictures and Tome. Shopping habits are shifting in parallel. Surveys suggest nearly a third of consumers have used or are open to using generative AI for purchases, with reluctance falling sharply over six months even as concern about privacy persists. Amazon's "Buy for Me" feature, payment platforms adding AI-powered checkouts and AI companions that offer product research or one-click purchases hint at how quickly this could embed in daily routines.

Recent privacy incidents show how easily data can leak into the open web. Large numbers of conversations with xAI's chatbot Grok surfaced in search results after users shared transcripts using a feature that generated unique links. Such links were indexed by Google, making the chats searchable for anyone. Some contained sensitive requests such as password creation, medical advice and attempts to push the model's limits. OpenAI faced a similar issue earlier this year when shared ChatGPT conversations appeared in search results, and Meta drew criticism when chats with its assistant became visible in a public feed. Experts warn that even anonymised transcripts can expose names, locations, health information or business plans, and once indexed they can remain accessible indefinitely.

Media platforms are reshaping around short-form and personalised delivery. ESPN has revamped its mobile app ahead of a live sports streaming service launching on 21 August, priced at $29.99 a month and including all 12 ESPN channels within the app. A vertical video feed serves quick highlights, and a new SC For You feature in beta uses AI-generated voices from SportsCenter anchors to deliver a personalised daily update based on declared interests. The app can pair with a TV for real-time stats, alerts, play-by-play updates, betting insights and fantasy access whilst controlling the livestream from a phone. Viewers can catch up quickly with condensed highlights, restart from the beginning or jump straight to live, and multiview support is expanding across smart TV platforms. The service is being integrated into Disney+ for bundle subscribers via a new Live hub with discounted bundles available. Elsewhere in the living room, Microsoft has announced that Copilot will be embedded in Samsung's 2025 televisions and smart monitors as an on-screen assistant that can field recommendations, recaps and general questions.

Energy and sustainability questions are surfacing with more data. Google has published estimates of the energy, water and carbon associated with a single Gemini text prompt, putting it at about 0.24 watt-hours, five drops of water and 0.03 grams of carbon dioxide. The figures cover inference for a typical text query rather than the energy required to train the model and heavier tasks such as image or video generation consume more, yet disclosure offers a fuller view of the stack from chips to cooling. Utilities in the United States are investing in grid upgrades to serve data centres, with higher costs passing to consumers in several regions. Economic currents are never far away. Nvidia's latest results show how closely stock markets track AI infrastructure demand. The company reported $46.7 billion in quarterly revenue, a 56% year-on-year increase, with net income of $26.4 billion, and now accounts for around 8% of the S&P 500's value. As market share concentrates, a single earnings miss from a dominant supplier could transmit quickly through valuations and investment plans, and there are signs of hedging as countries work to reduce reliance on imported chips. Industrial policy is shifting too. The US government is converting $8.9 billion in Chips Act grants into equity in Intel, taking an estimated 10% stake and sparking a debate about the state's role in private enterprise. Alongside these structural signals are market jitters. Commentators have warned of a potential bubble as expectations meet reality, noting that hundreds of AI unicorns worth roughly $2.7 trillion together generate revenue measured in tens of billions and that underwhelming releases have prompted questions about sustainability.

Adoption at enterprise scale remains uneven. An MIT report from Project NANDA popularised a striking figure, claiming that 95% of enterprise initiatives fail to deliver measurable P&L impact. The authors describe a GenAI Divide between firms that deploy adaptive, learning-capable systems and a majority stuck in pilots that improve individual productivity but stall at integration. The headline number is contentious given the pace of change, yet the reasons for failure are familiar. Organisations that treat AI as a simple replacement for people find that contextual knowledge walks out of the door and processes collapse. Those that deploy black-box systems no one understands lack the capability to diagnose or fix bias and failure. Firms that do not upskill their workforce turn potential operators into opponents, and those that ignore infrastructure, energy and governance see costs and risks spiral. Public examples of success look different. Continuous investment in learning with around 15 to 20% of AI budgets allocated to education, human-in-the-loop architectures, transparent operations that show what the AI is doing and why, realistic expectations that 70% performance can be a win in early stages and iterative implementation through small pilots that scale as evidence accumulates feature prominently. Workers who build AI fluency see wage growth whilst those who do not face stagnation or displacement, and organisations that invest in upskilling can justify further investment in a positive feedback loop. Even for the successful, there are costs. Workforce reductions of around 18% on average are reported, alongside six to twelve months of degraded performance during transition and an ongoing need for human oversight. Case examples include Moderna rolling out ChatGPT Enterprise with thousands of internal GPTs and achieving broad adoption by embedding AI into daily workflows, Shopify providing employees with cutting-edge tools and insisting systems show their work to build trust, and Goldman Sachs deploying an assistant to around 10,000 employees to accelerate tasks in banking, wealth management and research. The common thread is less glamour than operational competence. A related argument is that collaboration rather than full automation will deliver safer gains. Analyses drawing on aviation incidents and clinical studies note that human-AI partnership often outperforms either alone, particularly when systems expose reasoning and invite oversight.

Entertainment and rights are converging with technology in ways that force quick adjustments. Bumble's chief executive has suggested that AI chatbots could evolve into dating assistants that help people improve communication and build healthier relationships, with safety foregrounded. Music is shifting rapidly. Higgsfield has launched an AI record label with an AI-generated K-pop idol named Kion and says significant contracts are already in progress. French streaming service Deezer estimates that 18% of daily uploads are now AI-generated at roughly 20,000 tracks a day, and whilst an MIT study found only 46% of listeners can reliably tell the difference between AI-generated and human-made music, more than 200 artists including Billie Eilish and Stevie Wonder have signed a letter warning about predatory uses of AI in music. Disputes over authenticity are no longer academic. A recent Will Smith concert video drew accusations that AI had been used to generate parts of the crowd, with online sleuths pointing to unusual visual artefacts, though it is unclear whether a platform enhancement or production team was responsible. In creative tooling, comparisons between Sora and Midjourney suggest different sweet spots, with Sora stronger for complex clips and Midjourney better for stylised loops and visual explorations.

Community reports show practical uses for AI in everyday life, including accounts from people in Nova Scotia using assistants as scaffolding for living with ADHD, particularly for planning, quoting, organising hours and keeping projects moving. Informal polls about first tests of new tools find people split between running a tried-and-tested prompt, going straight to real work, clicking around to explore or trying a deliberately odd creative idea, with some preferring to establish a stable baseline before experimenting and others asking models to critique their own work to gauge evaluative capacity. Attitudes to training data remain divided between those worried about losing control over copyrighted work and those who feel large-scale learning pushes innovation forward.

Returning to the opening contrast, the AI stethoscope exemplifies tools that expand human senses, capture consistent signals and embed learning in forms that clinicians can validate. Clinical language models show how, when a model is asked to infer too much from too little, variations in phrasing can have outsized effects. That tension runs through enterprise projects. Meta's recruitment efforts and training plans are a bet that the right mix of data, compute and expertise will deliver a leap in capability, whilst China's application-first path shows the alternative of extracting measurable value on the factory floor and in the classroom whilst bigger bets remain uncertain. Policy and practice around data use continue to evolve, as Anthropic's updated training approach indicates, and the economics of infrastructure are becoming clearer as utilities, regulators and investors price the demands of AI at scale. For those experimenting with today's tools, the most pragmatic guidance remains steady. Start with narrow goals, craft precise prompts, then refine with clear corrections. Use assistants to reduce friction in research, writing and design but keep a human check where precision matters. Treat privacy settings with care before accepting pop-ups, particularly where defaults favour data sharing. If there are old photographs to revive, a model such as Gemini Flash 2.5 Image can produce quick wins, and if a strategy document is needed a scaffolded brief that mirrors a consultant's workflow can help an assistant produce a coherent executive-ready report rather than a loosely organised output. Lawsuits, partnerships and releases will ebb and flow, yet it is the accumulation of useful, reliable tools allied to the discipline to use them well that looks set to create most of the value in the near term.

Fixing WordPress 502 errors caused by incorrect WP-Optimise plugin permissions

31st August 2025

When I started getting a 502 status error on saving a WordPress post, it naturally was concerning, even if I found that the post was saved in the background. Inspection of the Nginx error log was in order, given that an invalid response was triggering the Bad Gateway message. That revealed that the WP-Optimise plugin was triggering the error when trying to access a location on the server. The cause was having incorrect permissions, which was sorted by something like the following:

chown -R www-data:www-data /var/www/html/wp-content/uploads/wpo
chmod -R 755 /var/www/html/wp-content/uploads/wpo

If it were to happen to you, the location may be different, as mine is for me. What you see above is the default. The chown command is assigning www-data as the owner and group for the specified location, with the -R switch triggering recursion for this action. After that, the chmod command is assigning full permissions for the owner, as well as read and execute for everyone else. All of this is to enable WP-Optimise to update its logs, the failure of which was causing the 502 messages in the first place.

Fixing Python path issues after Homebrew updates on Linux Mint

30th August 2025

With Python available by default, it is worth asking how the version on my main Linux workstation is made available courtesy of Homebrew. All that I suggest is that it either was needed by something else or I fancied having a newer version that was available through the Linux Mint repos. Regardless of the now vague reason for doing so, it meant that I had some work to do after running the following command to update and upgrade all my Homebrew packages:

brew update; brew upgrade

The first result was this message when I tried running a Python script afterwards:

-bash: /home/linuxbrew/.linuxbrew/bin/python3: No such file or directory

The solution was to issue the following command to re-link Python:

brew link --overwrite python@3.13

Since you may have a different version by the time that you read this, just change 3.13 above to whatever you have on your system. All was not quite sorted for me after that, though.

My next task was to make Pylance look in the right place for Python packages because they had been moved too. Initial inquiries were suggesting complex if robust solutions. Instead, I went for a simpler fix. The first step was to navigate to File > Preferences > Settings in the menus. Then, I sought out the Open Settings (JSON) icon in the top right of the interface and clicked on it to open a JSON containing VSCode settings. Once in there, I edited the file to end up with something like this:

    "python.analysis.extraPaths": [
        "/home/[account name]/.local/bin",
        "/home/[account name]/.local/lib/python[python version]/site-packages"
    ]

Clearly, your [account name] and [python version] need to be filled in above. That approach works for me so far, leaving the more complex alternative for later should I come to need that.

  • The content, images, and materials on this website are protected by copyright law and may not be reproduced, distributed, transmitted, displayed, or published in any form without the prior written permission of the copyright holder. All trademarks, logos, and brand names mentioned on this website are the property of their respective owners. Unauthorised use or duplication of these materials may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties.

  • All comments on this website are moderated and should contribute meaningfully to the discussion. We welcome diverse viewpoints expressed respectfully, but reserve the right to remove any comments containing hate speech, profanity, personal attacks, spam, promotional content or other inappropriate material without notice. Please note that comment moderation may take up to 24 hours, and that repeatedly violating these guidelines may result in being banned from future participation.

  • By submitting a comment, you grant us the right to publish and edit it as needed, whilst retaining your ownership of the content. Your email address will never be published or shared, though it is required for moderation purposes.