Command line installation and upgrading of VSCode and VSCodium on Windows, macOS and Linux
Downloading and installing software packages from a website is all very well until you need to update them. Then, a single command streamlines the process significantly. Given that VSCode and VSCodium are updated regularly, this becomes all the more pertinent and explains why I chose them for this piece.
Windows
Now that Windows 10 is more or less behind us, we can focus on Windows 11. That comes with the winget command by default, which is handy because it allows command line installation of anything that is in the Windows store, which includes VSCode and VSCodium. The commands can be as simple as these:
winget install VisualStudioCode
winget install VSCodium.VSCodium
The above is shorthand for this, though:
winget install --id VisualStudioCode
winget install --id VSCodium.VSCodium
If you want exact matches, the above then becomes:
winget install -e --id VisualStudioCode
winget install -e --id VSCodium.VSCodium
For upgrades, this is what is needed:
winget upgrade Microsoft.VisualStudioCode
winget upgrade VSCodium.VSCodium
Even better, you can do an upgrade everything at once operation:
winget upgrade --all
The last part certainly is better than the round trip to a website and back to going through an installation GUI. There is a lot less mouse clicking for one thing.
macOS
On macOS, you need to have Homebrew installed to make things more streamlined. To complete that, you need to run the following command (which may need you to enter your system password to get things to happen):
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Then, you can execute one or both of these in the Terminal app, perhaps having to authorise everything with your password when requested to do so:
brew install --cask visual-studio-code
brew install --cask vscodium
The reason for the -cask switch is that these are apps that you want to go into the correct locations on macOS as well as having their icons appear in Launchpad. Omitting it is fine for command line utilities, but not for these.
To update and upgrade everything that you have installed via Homebrew, just issue the following in a terminal session:
brew update && brew upgrade
Debian, Ubuntu & Linux Mint
Like any other Debian or Ubuntu derivative, Linux Mint has its own in-built package management system via apt. Other Linux distributions have their own way of doing things (Fedora and Arch come to mind here), yet the essential idea is similar in many cases. Because there are a number of steps, I have split out VSCode from VSCodium for added clarity. Because of the way that things are set up, one or both apps can be updated using the usual apt commands without individual attention.
VSCode
The first step is to download the repository key using the following command:
wget -qO- https://packages.microsoft.com/keys/microsoft.asc \
| gpg --dearmor > packages.microsoft.gpg
sudo install -D -o root -g root -m 644 packages.microsoft.gpg /etc/apt/keyrings/packages.microsoft.gpg
Then, you can add the repository like this:
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/packages.microsoft.gpg] \
https://packages.microsoft.com/repos/code stable main" \
| sudo tee /etc/apt/sources.list.d/vscode.list
With that in place, the last thing that you need to do is issue the command for doing the installation from the repository:
sudo apt update; sudo apt install code
Above, I have put two commands together: one to update the repository and another to do the installation.
VSCodium
Since the VSCodium process is similar, here are the three commands together: one for downloading the repository key, another that adds the new repository and one more to perform the repository updates and subsequent installation:
curl -fSsL https://gitlab.com/paulcarroty/vscodium-deb-rpm-repo/raw/master/pub.gpg \
| sudo gpg --dearmor | sudo tee /usr/share/keyrings/vscodium-archive-keyring.gpg >/dev/null
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/vscodium-archive-keyring.gpg] \
https://download.vscodium.com/debs vscodium main" \
| sudo tee /etc/apt/sources.list.d/vscodium.sources
sudo apt update; sudo apt install codium
After the three steps have completed successfully, VSCodium is installed and available to use on your system, and is accessible through the menus too.
A more elegant way to read and combine data from multiple CSV files in Julia
When I was compiling financial information for my accountant recently, I needed to read in a number of CSV files and combine their contents to enable further processing. This was all in a Julia script, and there was a time when I would use an explicit loop to do this combination. However, I came across a better way to accomplish this that I am sharing here with you now. First, you need to define a list of files like this:
files = ["5-2024.csv", "6-2024.csv", "7-2024.csv", "9-2024.csv", "10-2024.csv", "11-2024.csv", "12-2024.csv", "1-2025.csv", "2-2025.csv", "3-2025.csv", "4-2025.csv"]
Where there are alternatives to the above, including globbing (using wildcards with a Julia package that works with these), I decided to keep things simple for myself. Now we come to the line that does all the heavy lifting:
df = vcat([CSV.read(dir * file, DataFrame, normalizenames = true, header = 5, skipto = 6; silencewarnings=true) for file in files]...)
Near the end, there is the list comprehension ([***** for file in files]) that avoids the need for an explicit loop that I have used a few times in the past. This loops through each file in the list defined at the top, reading it into a dataframe as per the DataFrame option. The normalizenames option replaces spaces with underscores and cleans up any invalid characters. The header and skipto options tell Julia where to find the column headings and where to start reading the file, respectively. Then, the silencewarnings option suppresses any warnings about missing columns or inconsistent rows; clearly a check on the data frame is needed to ensure that all is in order if you wish to go the same route as I did.
The splat (...) operator takes the resulting list of data frames and converts them into individual arguments passed to the vcat function that virtually concatenates them together to create the df data frame. Just like suppressing warnings about missing columns or inconsistent rows during CSV file read time, this involves trust in the input data that everything is structured alike. Naturally, you need to do your own checks to ensure that is the case, as it was for me with what I had to do.
Loading API Keys from Linux shell environment variables in Python with Dotenv
Recently, I ran into trouble with getting Python to pick up an API key that I had defined in the underlying bash environment. This was within a Python console running inside the Positron IDE for R and Python scripting. Opening up the folder containing my Python scripts within the IDE was part of the solution. The next part was creating a .env file within the same folder. A line like this was added within the new file:
export API_KEY="<API key value>"
That meant that code like the following then read in the API key in a more robust manner:
import os
from dotenv import load_dotenv
load_dotenv()
api_key = os.getenv('API_KEY', 'default_value')
This imports the os module and the load_dotenv method from the dotenv package. Then, load_dotenv is executed to load the .env file and its contents. After that, the os.getenv function can assign the API key to a Python variable from the value of the environment variable.
Since this also was within a Git repository, a .gitignore file needed creating with the contents .env to avoid that file being uploaded to GitHub, which is the last place where you should be storing credentials like passwords, passphrases and API keys. While my repository may be private, the state of things at these troubled times mean that even that is no failsafe.
Taking control of Ruff checks on Python scripts
Positron is becoming my tool of choice for developing Python code. Along from using a Python console like a REPL environment, it also includes Ruff for checking code compliance. One of its rules is that Python modules must be declared at the top. However, I want to use some code that checks for the present of any modules used in a script, installing those that are missing. This means that import statements appear later in a script that Ruff recommends, making me wish for a way to turn off that check since things run well anyway. The chosen solution is to create a file called pyproject.toml in the directory where my scripts are store and add the following lines in there to accomplish what I want:
[tool.ruff]
ignore = ["E402"]
Here, it helps if you open a folder in Positron, achieving the same outcome as you would in the VSCode on which the IDE is based. While I have only listed one check here, you also can have a comma-delimited list of quoted strings if you need to switch off more than one rule at once.
Controlling the version of Python used in the Positron console with virtual environments
Because I have Homebrew installed on my Linux system for getting Hugo and LanguageTool on there, I also have a later version of Python than is available from the Linux Mint repositories. Both 3.12 and 3.13 are on my machine as a consequence. Here is the line in my .bashrc file that makes that happen:
eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)"
The result is when I issue the command which python3, this is what I get:
/home/linuxbrew/.linuxbrew/bin/python3
However, Positron looks to /usr/bin/python3 by default. Since this can get confusing, setting a virtual environment has its uses as long as you create it with the intended Python version. This is how you can do it, even if I needed to use sudo mode for some reason:
python3 -m venv .venv
When working solely on the command line, activating it becomes a necessity, adding another manual step to a mind that had resisted all this until recently:
source .venv/bin/activate
Thankfully, just issuing the deactivate command will do the eponymous action. Even better, just opening a folder with a venv in Positron saves you from issuing the extra commands and grants you the desired Python version in the console that it opens. Having run into some clashes between package versions, I am beginning to appreciate having a dedicated environment for a set of Python scripts, especially when an IDE makes it easy to work with such an arrangement.
Avoiding Python missing package errors with automatic installation checks
Though some may not like having something preceding package import statements in Python scripts, I prefer the added robustness of an extra piece of code checking for package presence and installing anything that is missing in place getting an error. In what follows, I define the list of packages that need to be present for everything to work:
required_packages = ["pandas", "tqdm", "progressbar2", "sqlalchemy", "pymysql"]
Then, I declare the inbuilt modules in advance of looping through the list that was already defined (adding special handling for a case where there has been a name change):
import subprocess
import sys
for package in required_packages:
try:
__import__(package if package != "progressbar2" else "progressbar")
print(f"{package} is already installed.")
except ImportError:
print(f"{package} not found. Installing...")
subprocess.check_call([sys.executable, "-m", "pip", "install", package])
The above code tries importing the package and catches the error to do the required installation. While a stable environment may be a better way around all of this, I find that this way of working adds valuable robustness to a script and automates what you would need to do anyway. Though the use of requirements files and even the Poetry tool for dependency management may be next steps, this approach suffices for my simpler needs, at least when it comes to personal projects.
Some Data Science newsletters that may be worth your time
Staying informed about developments in data science and artificial intelligence without drowning in an endless stream of blog posts, research papers and tool announcements presents a genuine challenge for practitioners. The newsletters profiled below offer a solution to this problem by delivering curated digests at weekly or near-weekly intervals, filtering what matters from the constant flow of new content across the field. Each publication serves a distinct purpose, from broad data science coverage and community event notifications to AI business strategy and statistical foundations, allowing readers to select resources that match their specific interests, whether technical depth, practical application, career development or strategic awareness. What follows examines what each newsletter offers, who benefits most from subscribing, and what limitations or trade-offs readers should consider when choosing which digests merit a place in their inbox.
Launched in 2014 by Lon Reisberg, this newsletter distinguishes itself through expert curation with minimal hype. It maintains strong editorial consistency and neutrality, presenting a handful of carefully selected articles that genuinely matter rather than overwhelming subscribers with dozens of links. The free version delivers this curated digest, whilst the Pro tier (fifty dollars annually) offers searchable archives spanning over 250 issues back to 2019, plus AI-powered learning tools including a SQL tutor and interview coach. The newsletter's defining characteristic is its quality-over-quantity approach, serving professionals who trust expert curation to surface what is genuinely important without the noise and hype that characterises many industry publications.
Data Science Weekly Newsletter
One of the oldest independent data science newsletters, having published over 400 issues since 2014, this publication sets itself apart through longevity and unwavering consistency. It delivers every Thursday without fail, maintaining a simple, distraction-free format with no over-commercialisation or fluff. Its unique value lies in this dependability, with subscribers knowing exactly what to expect each week, making it a practical baseline for staying current without surprises or dramatic shifts in editorial direction.
Unlike newsletters that simply curate external content, this publication builds its own ecosystem of learning resources, offering something fundamentally different through its open, community-driven approach. It combines free courses (Zoomcamps), events and a supportive Slack community, with all materials publicly available on GitHub. The newsletter keeps members informed about upcoming cohorts, webinars and talks within this collaborative environment. The defining feature is its entirely open and peer-supported approach, where readers gain access not just to information, but to hands-on learning opportunities and a community of practitioners willing to help each other grow.
Founded in 1997 by Gregory Piatetsky-Shapiro, this publication stands apart through industry authority spanning nearly three decades. It holds unmatched credibility through its longevity and comprehensive coverage, known for its annual software polls, data science career resources and balanced mix of expert articles, surveys and tool trends that appeal equally to technical practitioners and managers seeking a global overview of the field. What sets it apart is this authoritative position, with few publications able to match its track record or breadth of influence across both technical and strategic aspects of data science and AI.
Connected to the Open Data Science Conference network, this newsletter distinguishes itself as the gateway to the global data science event ecosystem. It serves as the practitioner's bridge to events, training, webinars and conferences worldwide. It covers the full stack, from tutorials and research to business use cases and career advice, but its distinctive strength lies in connecting readers to the broader data science community through live events and practical learning opportunities. The defining characteristic is this conference-linked, community-rich approach, proving especially valuable for professionals who want to remain active participants in the field rather than passive consumers of content.
Maintaining a unique position by focusing entirely on statistical foundations, Whilst most data science newsletters chase the latest AI developments, it maintains an unwavering focus on statistics and foundational analysis, providing step-by-step tutorials for Excel, R and Python that emphasise statistical intuition over trendy techniques. This singular focus on fundamentals makes it unique, serving as an essential complement to AI-focused newsletters and helping readers build the statistical knowledge base that underpins sound data science practice.
Created by the makers of KDnuggets, this digital newsletter and media platform carves out a distinctive niche with business-focused AI news for non-technical leaders. It curates AI developments specifically for executives and decision-makers, emphasising practical, non-technical insights about tools, regulations and market moves, backed by an AI tool database and a claimed community of over 400,000 subscribers. What sets it apart is this strategic, implementation-focused perspective, concentrating on what AI means for business strategy rather than explaining how AI works, making it accessible to leaders without deep technical backgrounds.
Published weekly by DeepLearning.AI, co-founded by Andrew Ng, this newsletter offers trusted commentary that combines AI news with insightful analysis. Written by leading experts, it provides a balanced view that merges academic grounding with applied, real-world context. The distinguishing feature is this authoritative perspective on implications, helping engineers, product teams and business leaders understand why developments matter and how to think about their practical impact rather than simply reporting what happened.
Managing Python projects with Poetry
Python Poetry has become a popular choice for managing Python projects because it unifies tasks that once required several tools. Instead of juggling pip for installation, virtualenv for isolation and setuptools for packaging, Poetry brings these strands together and aims to make everyday development feel predictable and tidy. It sits in the same family of all-in-one managers as npm for JavaScript and Cargo for Rust, offering a coherent workflow that spans dependency declaration, environment management and package publishing.
At the heart of Poetry is a simple idea: declare what a project needs in one place and let the tool do the orchestration. Projects describe their dependencies, development tools and metadata in a single configuration file, and Poetry ensures that what is installed on one machine can be replicated on another without nasty surprises. That reliability comes from the presence of a lock file. Once dependencies are resolved, their exact versions are recorded, so future installations repeat the same outcome. The intent here is not only convenience but determinism, helping teams avoid the "works on my machine" refrain that haunts software work.
Core Concepts: Configuration and Lock Files
Two files do the heavy lifting. The pyproject.toml file is where a project announces its name, version and description, as well as the dependencies required to run and to develop it. The poetry.lock file captures the concrete resolution of those requirements at a particular moment. Together, they give you an auditable, repeatable picture of your environment. The structure of TOML keeps the configuration readable, and it spares developers from spreading equivalent settings across setup.cfg, setup.py and requirements.txt. A minimal example shows how this looks in practice.
[tool.poetry]
name = "my_project"
version = "0.1.0"
description = "Example project using Poetry"
authors = ["John <john@example.com>"]
[tool.poetry.dependencies]
python = "^3.10"
requests = "^2.31.0"
[tool.poetry.dev-dependencies]
pytest = "^8.0.0"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
Essential Commands
Working with Poetry day to day quickly becomes a matter of a few memorable commands. Initialising a project configuration starts with poetry init, which steps through the creation of pyproject.toml interactively. Adding a dependency is handled by poetry add followed by the package name. Installing everything described in the configuration is done with poetry install, which writes or updates the lock file. When it is time to refresh dependencies within permitted version ranges, poetry update re-resolves and updates what's installed. Removing a dependency is poetry remove, followed by the package name. For environment management, poetry shell opens a shell inside the virtual environment managed by Poetry, and poetry run allows execution of commands within that same environment without entering a shell. Building distributions is as simple as poetry build, which produces a wheel and a source archive, and publishing to the Python Package Index is managed by poetry publish with credentials or an API token.
Advantages and Considerations
There are clear advantages to taking this route. The dependency experience is simplified because you do not need to keep updating a requirements.txt file by hand. With a lock file in place, environments are reproducible across developer machines and continuous integration runners, which stabilises builds and testing. Packaging is integrated rather than an extra chore, so producing and publishing a release becomes a repeatable process that sits naturally alongside development. Virtual environments are created and activated on demand, keeping projects isolated from one another with little ceremony. The configuration in TOML has the benefit of being structured and human-readable, which reduces the likelihood of configuration drift.
There are also points to consider before adopting Poetry. Projects that are deeply invested in setup.py or complex legacy build pipelines may need a clean migration to pyproject.toml for avoiding clashes. Developers who prefer manual venv and pip workflows can find Poetry opinionated at first because it expects to be responsible for the environment and dependency resolution. It is also designed with modern Python versions in mind, with examples here using Python 3.10.
Migration from pip and requirements.txt
For teams arriving from pip and requirements.txt, moving to Poetry can be done in measured steps. The starting point is installation. Poetry provides an installer script that sets up the tool for your user account.
curl -sSL https://install.python-poetry.org | python3 -
If the installer does not add Poetry to your PATH, adding $HOME/.local/bin to PATH resolves that, after which poetry --version confirms the installation. From the root of your existing project, poetry init creates a new pyproject.toml and invites you to provide metadata and dependencies. If you already maintain requirements.txt files for production and development dependencies, Poetry can ingest those in one sweep. A single file can be imported with poetry add $(cat requirements.txt). Where development dependencies live in a separate file, they can be added into Poetry's dev group with poetry add --group dev $(cat dev-requirements.txt). Once added, Poetry resolves and pins exact versions, leaving a lock file behind to capture the resolution. After verifying that everything installs and tests pass, it becomes safe to retire earlier environment artefacts. Many teams remove requirements.txt entirely if they plan to rely solely on Poetry, deleting any remnants of Pipfile and Pipfile.lock that were left by Pipenv and migrate metadata away from setup.py or setup.cfg in favour of pyproject.toml. With that done, using the environment becomes routine. Opening a shell inside the virtual environment with poetry shell makes commands such as python or pytest use the isolated interpreter. If you prefer to avoid entering a shell, poetry run python script.py or poetry run pytest executes the command in the right context.
Package Publishing
Publishing a package is one of the areas where Poetry streamlines the steps. Accurate metadata in pyproject.toml is important, so name, version, description and other fields should be up-to-date. An example configuration shows commonly used fields.
[tool.poetry]
name = "example-package"
version = "1.0.0"
description = "A simple example package"
authors = ["John <john@example.com>"]
license = "MIT"
readme = "README.md"
homepage = "https://github.com/john/example-package"
repository = "https://github.com/john/example-package"
keywords = ["example", "poetry"]
With metadata set, building the distribution is handled by poetry build, which creates a dist directory containing a .tar.gz source archive and a .whl wheel file. Uploading to the official Python Package Index can be done with username and password, though API tokens are the recommended method because they can be scoped and revoked without affecting account credentials. Configuring a token is done once with poetry config pypi-token.pypi, after which poetry publish will use it to upload. When testing a release before publishing for real, TestPyPI provides a safer target. Poetry supports multiple sources and can be directed to use TestPyPI by declaring it as a repository and then publishing to it.
[[tool.poetry.source]]
name = "testpypi"
url = "https://test.pypi.org/legacy/"
poetry publish -r testpypi
Once uploaded, it is sensible to confirm that the package can be installed in a clean environment using pip install example-package, which verifies that dependencies are correctly declared and wheels are intact.
Continuous Integration with GitHub Actions
Beyond local steps, automation closes the loop. Adding a continuous integration workflow that installs dependencies, runs tests and publishes on a tagged release keeps quality checks and distribution consistent. GitHub Actions provides a hosted environment where Poetry can be installed quickly, dependencies cached and tests executed. A straightforward workflow listens for tags that begin with v, such as v1.0.0, then builds and publishes the package once tests pass. The workflow file sits under .github/workflows and looks like this.
name: Publish to PyPI
on:
push:
tags:
- "v*"
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.10"
- name: Install Poetry
run: |
curl -sSL https://install.python-poetry.org | python3 -
echo "$HOME/.local/bin" >> $GITHUB_PATH
- name: Install dependencies
run: poetry install --no-interaction --no-root
- name: Run tests with pytest
run: poetry run pytest --maxfail=1 --disable-warnings -q
- name: Build package
run: poetry build
- name: Publish to PyPI
if: startsWith(github.ref, 'refs/tags/v')
env:
POETRY_PYPI_TOKEN_PYPI: ${{ secrets.PYPI_TOKEN }}
run: poetry publish --no-interaction --username __token__ --password $POETRY_PYPI_TOKEN_PYPI
This arrangement checks out the repository, installs a consistent Python version, brings in Poetry, installs dependencies based on the lock file, runs tests, builds distributions and only publishes when the workflow is triggered by a version tag. The API token used for publishing should be stored as a repository secret named PYPI_TOKEN so it is not exposed in the codebase or logs. Creating the tag is done locally with git tag v1.0.0 followed by git push origin v1.0.0, which triggers the workflow and results in a published package, moments later. It is often useful to extend this with a test matrix, so the suite runs across supported Python versions, as well as caching to speed up repeated runs by re-using Poetry and pip caches keyed on the lock file.
Project Structure
Package structure is another place where Poetry encourages clarity. A simple, consistent layout makes maintenance and onboarding easier. A typical library keeps its importable code in a package directory named to match the project name in pyproject.toml, with hyphens translated to underscores. Tests live in a separate tests directory, documentation in docs and examples in a directory of the same name. The repository root contains README.md, a licence file, the lock file and a .gitignore that excludes environment directories and build artefacts. The following tree illustrates a balanced structure for a data-oriented utility library.
data-utils/
├── data_utils/
│ ├── __init__.py
│ ├── core.py
│ ├── io.py
│ ├── analysis.py
│ └── cli.py
├── tests/
│ ├── __init__.py
│ ├── test_core.py
│ └── test_analysis.py
├── docs/
│ ├── index.md
│ └── usage.md
├── examples/
│ └── demo.ipynb
├── README.md
├── LICENSE
├── pyproject.toml
├── poetry.lock
└── .gitignore
Within the package directory, init.py can define a public interface and hide internal details. This allows users of the library to import the essentials without needing to know the module layout.
from .core import clean_data
from .analysis import summarise_data
__all__ = ["clean_data", "summarise_data"]
If the project offers a command-line interface, Poetry makes it simple to declare an entry point, so users can run a console command after installation. The scripts section in pyproject.toml maps a command name to a callable, in this case the main function in a cli module.
[tool.poetry.scripts]
data-utils = "data_utils.cli:main"
A basic CLI might be implemented using Click, passing arguments to internal functions and relaying progress.
import click
from data_utils import core
@click.command()
@click.argument("path")
def main(path):
"""Simple CLI example."""
print(f"Processing {path}...")
core.clean_data(path)
print("Done!")
if __name__ == "__main__":
main()
Git ignores should filter out files that do not belong in version control. A sensible default for a Poetry project is as follows.
__pycache__/
*.pyc
*.pyo
*.pyd
.env
.venv
dist/
build/
*.egg-info/
.cache/
.coverage
- Testing and Documentation
Testing sits comfortably alongside this. Many projects adopt pytest because it is straightforward to use and integrates well with Poetry. Running tests through poetry run pytest ensures the virtual environment is used, and a simple unit test demonstrates the pattern.
from data_utils.core import clean_data
def test_clean_data_removes_nulls():
data = [1, None, 2, None, 3]
cleaned = clean_data(data)
assert cleaned == [1, 2, 3]
Documentation can be kept in Markdown or built with tools. MkDocs and Sphinx are common choices for generating websites from your docs, and both can be installed as development dependencies using Poetry. Including notebooks in an examples directory is helpful for illustrating usage in richer contexts, especially for data science libraries. The README should present the essentials succinctly, covering what the project does, how to install it, a short usage example and pointers for development setup. A licence file clarifies terms of use; MIT and Apache 2.0 are widely used options in open source. Advanced CI: Quality Checks and Multi-version Testing
Once structure, tests and documentation are in order, quality checks can be expanded in the continuous integration workflow. Adding automated formatting, import sorting and linting tightens consistency across contributions. An enhanced workflow uses Black, isort and Flake8 before running tests and building, and also includes a matrix to test across multiple Python versions. It runs on pull requests as well as on tagged pushes, which means code quality and compatibility are verified before merging changes and again before publishing a release.
name: Lint, Test and Publish
on:
push:
tags:
- "v*"
pull_request:
jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.9", "3.10", "3.11"]
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install Poetry
run: |
curl -sSL https://install.python-poetry.org | python3 -
echo "$HOME/.local/bin" >> $GITHUB_PATH
- name: Cache Poetry dependencies
uses: actions/cache@v4
with:
path: |
~/.cache/pypoetry
~/.cache/pip
key: poetry-${{ runner.os }}-${{ hashFiles('**/poetry.lock') }}
restore-keys: |
poetry-${{ runner.os }}-
- name: Install dependencies
run: poetry install --no-interaction --no-root
- name: Check code formatting with Black
run: poetry run black --check .
- name: Check import order with isort
run: poetry run isort --check-only .
- name: Run Flake8 linting
run: poetry run flake8 .
- name: Run tests with pytest
run: poetry run pytest --maxfail=1 --disable-warnings -q
- name: Build package
run: poetry build
- name: Publish to PyPI
if: startsWith(github.ref, 'refs/tags/v')
env:
POETRY_PYPI_TOKEN_PYPI: ${{ secrets.PYPI_TOKEN }}
run: poetry publish --no-interaction --username __token__ --password $POETRY_PYPI_TOKEN_PYPI
This workflow builds on the earlier one by checking style and formatting before tests. If any of those checks fail, the process stops and surfaces the problems in the job logs. Caching based on the lock file reduces the time spent installing dependencies by reusing packages where nothing has changed. The matrix section ensures that the library remains compatible with the declared range of Python versions, which is especially helpful just before a release. It is possible to extend this further with coverage reports using pytest-cov and Codecov, static type checking with mypy, or pre-commit hooks to keep local development consistent with continuous integration. Publishing to TestPyPI in a separate job can help validate packaging without affecting the real index, and once outcomes look good, the main publishing step proceeds when a tag is pushed.
Conclusion
The result of adopting Poetry is a project that states its requirements clearly, installs them reliably and produces distributions without ceremony. For new work, it removes much of the friction that once accompanied Python packaging. For existing projects, the migration path is gentle and reversible, and the gains in determinism often show up quickly in fewer environment-related issues. When paired with a small amount of automation in a continuous integration system, the routine of building, testing and publishing becomes repeatable and visible to everyone on the team. That holds whether the package is destined for internal use on a private index or a public release on PyPI.
A portable software repository comparison: PortableApps versus Portapps for Windows users
Moving between computers remains a fact of life for many people, whether working across office desktops and home laptops, studying in shared facilities or visiting clients and public spaces. Installing the same software repeatedly, then recreating familiar settings, can become a routine that wastes time and raises permission hurdles. Portable software aims to sidestep that friction by running without traditional installation, carrying preferences along for the ride, and leaving little behind on host machines.
Two notable projects occupy this space for Windows users: PortableApps.com and Portapps. Each offers a different route to a similar destination, and together they show how far the idea has progressed since the early days of USB sticks and limited storage. Both platforms enable users to create self-contained software environments that can travel between machines whilst maintaining settings and data integrity.
PortableApps.com: The Established Platform
PortableApps.com is often the first name people encounter, and with good reason. It has grown into a platform as much as a collection, providing a launcher that helps manage the entire portable environment. The project began in the early 2000s, created by John T. Haller, and has remained free and open source since then.
Core Architecture
The premise is straightforward. Applications are repackaged so they can live within a self-contained folder structure that can sit on removable storage or inside a cloud-synchronised folder. When launched from that location, they behave as if they were installed locally, only their configuration and data reside in the portable directory rather than the Windows registry or system folders. As a result, moving the folder to another machine brings the software and its settings along, keeping the host computer cleaner and reducing the need for elevated privileges.
The Platform Ecosystem
Much of the appeal lies in the PortableApps.com Platform, a menu and suite that acts as a hub. Rather than scattering shortcuts across the desktop, the platform collects everything in one place with a menu that can sit on a USB drive or a cloud drive. From here, users can run applications, group them in folders, mark favourites and initiate updates, all with a consistent interface.
The catalogue has grown substantially, now featuring over 1,400 portable packages spanning multiple categories: Accessibility, Development, Education, Games, Graphics & Pictures, Internet, Music & Video, Office, Security and Utilities. This includes major applications like LibreOffice, Firefox, GIMP, VLC media player, and hundreds of specialised tools across every computing category. That breadth helps the platform function as a complete environment rather than a one-off fix for a particular program. A person could keep a preferred browser with extensions and bookmarks, a document editor for quick edits, an image viewer for photos and a handful of diagnostic tools, all launched from the same menu.
Because the platform is designed to operate from cloud-synchronised locations as well, some forgo physical drives and keep their PortableApps directory inside providers like Dropbox or Google Drive. That way, the same set of tools appears on every machine where the cloud client is installed, with settings following through the sync client.
Portapps: The Modular Approach
Running alongside PortableApps is Portapps, an independent collection that also repackages Windows software to run portably, albeit with a different structure. Portapps distributes applications either as portable set-up files or as 7-Zip archives. Each title typically includes a small wrapper executable, named with a "-portable.exe" suffix, that orchestrates the portability layer.
Technical Implementation
That wrapper is written in Go and handles redirection of paths, environment configuration and other adjustments required to run the original application without leaving permanent traces on the host. The project is open source under the MIT licence, and many of its components live on GitHub, where users can watch releases and inspect how builds are constructed.
Usage and Transparency
Running a Portapps package is uncomplicated. After downloading the portable version of a supported application from the Portapps site or the relevant GitHub repository, the user extracts the files and launches the wrapper executable. The wrapper ensures that configuration and data reside in the portable directory and that the program operates without installing into Windows.
Portapps emphasises transparency around its build process. Properties and scripts are published, so observers can see how original sources are obtained and how wrappers are applied. Releases are versioned and binaries are provided, with wrappers scanned on VirusTotal to provide added confidence. The maintainers acknowledge that heuristic scanning can sometimes trigger false positives because of how the wrappers work, a reality that users should weigh against their own antivirus alerts and verification habits.
Application Focus and Updates
Portapps maintains a more selective catalogue of 54 applications, focusing primarily on modern software and developer tools. The collection includes popular applications like Discord, Visual Studio Code, Brave browser, VLC media player, Postman, IntelliJ IDEA, and various communication tools. The project targets contemporary software, particularly applications built with frameworks like Electron, and emphasises quality over quantity in its selections.
Recent releases continue actively, with regular updates to maintained applications. However, some applications are discontinued when the original projects become abandoned or when maintenance becomes unfeasible, demonstrating the project's pragmatic approach to software curation.
Comparison: Platform vs Modular
The distinction between the two projects emerges in how they are structured and managed, rather than in their core aim. This creates different advantages for different use cases.
PortableApps.com Advantages
PortableApps offers a full platform anchored by a launcher. It provides centralised update notifications and the ability to upgrade installed portable applications whilst preserving data. It integrates back-up functions and a customisable interface that collects everything into a single, recognisable menu. This arrangement suits anyone who wants a managed, coherent environment that travels intact from one machine to another, whether on a drive or inside a cloud-synchronised folder.
The platform's maturity shows in its comprehensive feature set: automatic updates, integrated back-up systems, theme customisation and extensive language support. The sheer size of its catalogue (over 1,400 applications across 10 categories) means users can often find portable versions of most common applications they need, from basic utilities to professional software suites.
Portapps Advantages
Portapps takes a per-application approach centred on wrappers. It does not bundle a unified menu or a site-wide update mechanism. Instead, it focuses on packaging individual programs so that each can run on its own from a portable directory. For some, that modularity is appealing because it keeps each application independent and allows for granular control over what gets updated and when.
The transparency of Portapps is particularly notable. All source code, build scripts and packaging processes are openly available on GitHub. This makes it easier for technically inclined users to understand exactly how applications are made portable and to contribute improvements or fixes. The project's focused approach means its 54 applications are typically modern, well-maintained packages that target contemporary software needs, particularly in development and communication tools.
Trade-offs and Limitations
Both approaches share similar constraints. Performance can lag when running from slow USB flash drives, especially with applications that read and write frequently. A modern external SSD or high-quality USB 3.x drive mitigates this, but older media can make the difference noticeable.
Compatibility relies in part on the host Windows installation. Some portable programs require certain components to be present or struggle if the operating system is old or tightly locked down by policy. Security considerations apply to both: a portable device can be lost or stolen, so using encryption or secure storage matters if sensitive data are involved.
Another constraint is access to system-level features. Programs that need drivers, system services or administrative rights may not function as expected in portable form. Updates in Portapps require more manual intervention compared to PortableApps' centralised update system.
Which to Choose
The choice often comes down to preferences and requirements. Those who want a curated catalogue with a central launcher, integrated updates and back-up features will likely benefit from the PortableApps.com Platform. It reduces administrative overhead by keeping everything in one place and by handling upgrades whilst leaving settings untouched.
Those who prefer to choose individual portable packages, appreciate the transparency of wrapper-based builds, or focus on a subset of modern applications may lean towards Portapps. Both coexist comfortably because their aims overlap, yet their methods differ, and nothing stops a user from mixing them if that suits a particular workflow, though running two separate structures does introduce more to manage.
Practical Implementation
Setting up a portable environment generally begins with choosing where it will live. A fast USB 3.x flash drive or an external SSD keeps load times brisk and reduces frustration. If removable media is not desirable, a folder inside a cloud synchronisation service provides similar flexibility, just without the need to carry hardware.
PortableApps Setup
In the case of PortableApps, installing the platform to the chosen location yields a menu that can then be populated with software drawn from the catalogue. Updates can be triggered from within the platform and back-ups made as snapshots of the entire environment. The integrated app store makes discovering and installing new portable applications straightforward.
Portapps Setup
For Portapps, the process is more manual. Individual applications are selected from the website or GitHub, downloaded either as portable set-ups or archives, unpacked to a chosen directory, then started using the "-portable.exe" wrapper. Keeping track of updates often means revisiting the releases page for each application or subscribing to notifications.
Security Considerations
Security merits attention at the outset. Losing a drive can mean losing data, so encrypting the portable directory is wise, whether by encrypting the entire device with tools like BitLocker To Go or by placing the portable environment inside a container created with software such as VeraCrypt. Public or shared machines can carry malware risks, so scanning hosts when possible and treating sensitive actions with caution remains sensible.
Verifying downloads by checking hashes or signatures when provided, and scanning portable applications with antivirus software, adds another layer of reassurance. It is also useful to remember that even well-designed portable applications may leave temporary traces because Windows itself writes certain entries as part of normal operation. The objective is to limit permanent change, not to circumvent the operating system's behaviour entirely.
Performance Optimisation
Performance can be improved with a few choices. Using faster storage makes the largest difference, particularly for larger applications that read and write many files. Keeping the portable directory in a location that remains consistently available to a cloud client avoids sync stalls, and selecting a machine's local drive rather than a slow network path reduces latency. Ensuring that the portable environment is not subject to aggressive antivirus scanning on every read can sometimes help, though that has to be balanced against security policies.
Final Remarks
Portable software has matured from a niche convenience into a practical way of working that respects the realities of shared and changing environments. By focusing on containment, reducing dependency on installation and making updates and back-ups straightforward, projects like PortableApps and Portapps make it easier to carry a personal toolkit across diverse Windows machines.
The two platforms serve overlapping but distinct needs. PortableApps.com excels as a comprehensive, managed environment suitable for users who want everything integrated and maintained through a single interface. Its extensive catalogue and automated features make it particularly attractive for those building complete portable computing environments.
Portapps appeals to users who prefer transparency, modularity and direct control over individual applications. Its open development model and focused approach to specific modern applications make it valuable for technically minded users or those with specific software requirements.
Use cases abound for both approaches. Students and professionals who switch between school, work and home can keep a consistent environment without altering each machine. Technicians often carry diagnostic and repair tools that run without installation so they can assist on any PC they encounter. Travellers value having a browser and email client with their own preferences ready to use on shared computers.
With thought given to security, performance and management, both PortableApps and Portapps can add consistency to a computing life that is increasingly spread across locations and devices, all without imposing on the host systems that make it possible. The choice between them depends on whether one prioritises integrated management or modular control, but both represent mature approaches to an enduring challenge in modern computing.
Python productivity: Building better code through design, performance and scale
Python's success in data science and beyond stems from more than just readable syntax. It represents a coherent philosophy where errors guide development, explicitness prevents bugs, modern tooling enforces quality, performance comes from purpose-built engines, and scaling extends rather than replaces familiar patterns. Understanding these principles transforms everyday coding from a series of individual tasks into a systematic approach to building robust, maintainable and efficient systems.
Error-Driven Development as a Design Philosophy
Python treats errors not as failures, but as design features that surface problems early and prevent subtle defects later. The language embodies an "easier to ask forgiveness than permission" philosophy, attempting operations first and objecting meaningfully when they cannot proceed.
Consider how Python handles basic operations. A SyntaxError appears immediately when code violates grammatical rules: if True print("hello") triggers an immediate complaint with a caret pointing to the problematic location. Python neither guesses intentions nor continues with broken syntax because this guarantee of clear structure keeps code understandable across projects and platforms.
Sequence operations demonstrate similar principles. When code attempts to access lst[5] on a three-element list, Python raises IndexError: list index out of range rather than silently padding or expanding the sequence. This deliberate failure prevents hidden logic errors in loops and aggregations by forcing explicit checks of assumptions about data size.
Dictionary lookups follow the same pattern. Accessing a non-existent key with d['missing'] yields KeyError: 'missing' rather than inventing placeholder values. This explicit failure catches typos and unclear control flow whilst enabling defensive programming patterns through try/except blocks.
Name resolution errors like NameError and UnboundLocalError enforce clear scoping rules without creating variables accidentally or resolving names to unexpected contexts. Type discipline appears at runtime through TypeError for incorrect argument types and ValueError for correct types with inappropriate values. Each error message identifies which contract has been violated, directing fixes to either the object passed or the value it contains.
Assertions provide a final layer of optional verification. The assert statement allows code to state assumptions explicitly, failing with meaningful messages when invariants do not hold. This narrows the search space for defects by making expectations visible and providing immediate context for failures.
Taking these error signals seriously nudges development towards explicitness and clarity, establishing a foundation for all subsequent quality improvements.
Explicitness Over Implicitness
Making intentions clear through code structure prevents ambiguity, aids tooling and simplifies reuse. This principle manifests across multiple areas of Python development, from data structures to function signatures.
Raw dictionaries offer flexibility but create fragility. A typo in a key or missing field becomes a runtime KeyError with no contract about required contents. Using @dataclass to define structured objects like User with id, email, full_name, status and optional last_login provides clear interfaces with minimal overhead. Type hints and IDE support make attribute access unambiguous, whilst construction fails early when required fields are absent.
For cases requiring validation, pydantic models build on this foundation. An email field declared as EmailStr automatically validates format, while custom validators can restrict status values to specific options such as 'active', 'inactive' or 'pending'. The resulting models are self-documenting and shield downstream code from invalid data.
Function parameters representing closed sets of options benefit from similar treatment. Plain strings invite typos and lack autocomplete support. Defining enums such as OrderStatus with PENDING, SHIPPED and DELIVERED makes possible states explicit whilst helping both developers and tools. Passing OrderStatus.SHIPPED to process_order reveals intention clearly and enables straightforward comparisons against enum members.
Function signatures become clearer through keyword-only arguments, enforced with a bare star in definitions. A function like create_user(name, email, *, admin=False, notify=True, temporary=False) forces call sites to write create_user(..., admin=True, notify=False) rather than passing sequences of ambiguous boolean values. The resulting calls read almost as documentation.
File path operations improve through object-oriented design. The pathlib module treats paths as objects where joining uses natural / syntax, directory creation uses mkdir, suffix changes use with_suffix, and text operations use read_text and write_text. Code becomes shorter, more portable and less prone to string manipulation errors.
These patterns consistently replace implicit assumptions with explicit contracts, making code intention more visible and reducing the cognitive load of understanding system behaviour.
Structural Code Quality Through Tooling and Patterns
Sustainable code quality emerges from systematic approaches to organisation, testing and maintenance rather than individual discipline alone. Several key patterns and tools work together to create robust, readable codebases.
Control flow benefits from handling error conditions early rather than nesting deeply. Guard clauses invert the traditional structure so that invalid states return immediately, whilst main logic remains non-indented when preconditions are met. A process_payment function checking order.is_valid, then user.has_payment_method, then available funds before performing charges reads linearly. Exceptions during processing are caught precisely, errors logged with context, and functions return deterministically.
Even beloved list comprehensions have limits. When filtering and transformation logic become complex, sprawling comprehensions become opaque. Extracting predicates into named functions like is_valid_premium_user restores readability by giving conditions clear names. Where multiple checks and transformations are needed, conventional loops with early continue statements may prove more straightforward and debuggable.
Pure functions that accept all inputs as parameters and return results without changing external state simplify testing and reuse. Moving from designs where functions mutate global totals and read from global inventories to approaches where calculations accept prices, quantities and discounts as inputs removes hidden coupling. This enables deterministic testing of edge cases and reasoning about code without tracking changing state.
Documentation ties these practices together. Docstrings explaining what functions do, parameters they accept, values they return and including examples make codebases self-explanatory. Combined with tooling, docstrings serve as both reference and executable documentation.
Automation enforces consistency where human attention falters. Formatters like Black, linters like Ruff, static type checkers like mypy and import organisers like isort can run before each commit using pre-commit. Style issues and common mistakes are caught automatically, freeing mental capacity for higher-level concerns.
When handling errors, resist blanket except: statements that swallow everything from syntax errors to keyboard interrupts. Be specific where possible, catching ConnectionError, ValueError or database errors and handling each appropriately. When catch-alls are necessary, prefer except Exception as e: and log full tracebacks so that unexpected failures remain visible and traceable.
Performance Through Modern Engines
Once code achieves cleanliness and robustness, performance becomes the next frontier. Traditional tools often leave substantial speed gains on the table, particularly for data-intensive work where single-threaded processing creates bottlenecks on modern hardware.
Polars, a DataFrame library written in Rust, addresses these limitations by making parallelism the default whilst providing both eager and lazy execution modes. Benchmarks on datasets of around 580,000 rows show Polars completing filtering roughly four times faster than Pandas, aggregation over twenty times faster, groupby operations eight times faster, sorting three times faster, and feature engineering five times faster. These gains stem from fundamental architectural differences rather than incremental optimisations.
The performance improvement requires a shift in mental model. Instead of writing sequential operations that execute immediately, you can batch expressions and let Polars parallelise them automatically. Creating both profit and margin with one with_columns call signals that these calculations can proceed together. Lazy evaluation extends this approach further. Building pipelines with pl.scan_csv('large_file.csv').filter(...).group_by(...).agg(...).collect() lets Polars construct query plans, then optimises them before execution. Filters are pushed down so less data reaches later stages, only selected columns are read, and compatible operations are combined.
Expressiveness comes from an expression system applying operations across columns succinctly. Where Pandas encourages thinking in terms of single columns assigned individually, Polars supports expressions like pl.col(['revenue', 'cost']) * 1.1 applied to multiple columns simultaneously. Familiar transformations translate directly: pl.read_csv('sales.csv') replaces pd.read_csv, selection and filtering become df.filter(pl.col('order_value') > 500).select(['customer_id', 'order_value']), new columns are created with df.with_columns(((pl.col('revenue') - pl.col('cost')) / pl.col('revenue')).alias('profit_margin')), and operations utilise all available cores automatically.
Memory efficiency improves through Apache Arrow's columnar format, storing data more compactly and avoiding NumPy-based overhead. CSV files of around 2 GB requiring roughly 10 GB of RAM in Pandas often process in approximately 4 GB with Polars. This difference can determine whether workflows run smoothly on laptops or require chunking strategies.
Scaling Beyond Single Processes
When single processes reach their limits, two prominent approaches help scale Python across cores and machines whilst preserving familiar patterns and mental models.
Dask extends NumPy, Pandas and scikit-learn idioms to larger-than-memory datasets by partitioning arrays, DataFrames and computations then scheduling them in parallel. Its primary abstractions are dask.dataframe and dask.array, along with delayed task graphs. It excels for scalable batch processing, feature engineering and out-of-core work where the mental model remains close to the PyData stack. Integration with scikit-learn and XGBoost is mature, work-stealing schedulers are sophisticated, and detailed dashboards provide visibility. Clusters can be managed natively or through systems like Kubernetes and YARN.
For large-scale data cleaning and feature engineering, Dask provides natural extensions. Reading many CSV files from storage with dd.read_csv('s3://data/large-dataset-*.csv'), filtering rows with df[df['amount'] > 100], applying transformations per partition, then writing Parquet with df.to_parquet('s3://processed/output/') looks like Pandas but runs in parallel and out of core. Array computations through dask.array handle chunked operations so that x.mean(axis=0).compute() runs across partitions without exhausting memory.
Ray takes a more general approach to distributed computing through remote functions and actors. It suits workloads with many independent Python functions, stateful services and complex machine learning pipelines. A growing ecosystem includes Ray Tune for hyperparameter optimisation, Ray Train for multi-GPU training, Ray Serve for model serving, and RLlib for reinforcement learning. Scheduling is dynamic and actor-based, cluster management integrates with cloud providers, and scalability handles applications requiring control and flexibility.
For model training requiring many configuration explorations, Ray Tune provides schedulers and search strategies. Training functions can be wrapped and launched across workers with tune.run, with methods like ASHA stopping unpromising runs early. Integration with popular libraries means scaling experiments requires minimal code changes. Ray Serve turns model classes exposing __call__ methods into scalable services with @serve.deployment and serve.run, handling routing and scaling automatically.
Incremental Adoption and Pragmatic Choices
The most sustainable approach to improving Python productivity involves gradual implementation rather than wholesale changes. Each improvement builds on previous ones, creating compound benefits over time whilst minimising disruption to existing workflows.
Adopting Polars illustrates this principle well. The first step can be simply loading data with pl.read_csv('big_file.csv') for faster I/O, then converting to Pandas with .to_pandas() if the rest of the pipeline expects Pandas objects. As comfort grows, expression-oriented patterns yield dividends: filtering then adding multiple columns in single chained calls, so Polars can optimise across steps. Full benefits appear when entire pipelines are expressed lazily, but this transition can happen gradually as understanding deepens.
Similarly, clean code practices can be introduced incrementally. Start by letting error messages guide fixes rather than suppressing them. Refactor one fragile dictionary into a dataclass when maintenance becomes painful. Extract a complex list comprehension into a named function when debugging becomes difficult. Each change teaches principles that apply more broadly whilst delivering immediate benefits.
Scaling decisions are often pragmatic rather than theoretical. If work centres on DataFrames and arrays with minimal conceptual shift from Pandas or NumPy, Dask likely delivers what you need. If workloads mix training, tuning and serving or require orchestrating many concurrent Python tasks with fine-grained control, Ray's abstractions and libraries provide better matches. Trying each approach on representative workflow slices quickly clarifies which will serve best.
The choice between tools should be driven by actual requirements rather than perceived sophistication. A single machine with Polars may outperform a small cluster running Pandas. A well-structured monolithic application may be more maintainable than a prematurely distributed system. The key is understanding when complexity serves genuine needs rather than adding overhead.
Synthesis: The Compound Nature of Python Productivity
These themes work together rather than in isolation. Error-driven development creates habits that surface problems early. Explicit code structures make intentions clear to both humans and tools. Quality practices through tooling and patterns create sustainable foundations. Modern engines provide performance without sacrificing readability. Scaling approaches extend familiar patterns rather than replacing them. Incremental adoption ensures changes compound rather than disrupt.
The result is a coherent approach to Python development where each improvement reinforces others. Explicit data structures work better with static type checkers. Pure functions are easier to test and parallelise. Clean error handling integrates naturally with distributed systems. Modern DataFrame engines benefit from lazy evaluation patterns that also improve code clarity.
This synthesis explains Python's enduring appeal in data science and beyond. The language welcomes beginners with approachable syntax, whilst scaling to demanding production work without losing clarity. The ecosystem encourages practices that speed up teams over time rather than optimising for immediate gratification. The same principles that guide small scripts apply to large systems, creating a path for continuous improvement rather than periodic rewrites.
Start small: let error messages guide one fix, refactor one fragile dictionary into a dataclass, switch one slow operation to Polars, or run one hyperparameter sweep with Ray Tune. The improvements compound, and the foundations established early enable sophisticated capabilities later without fundamental changes to approach or mindset.