TOPIC: PYTHON
Managing Python projects with Poetry
4th October 2025Python Poetry has become a popular choice for managing Python projects because it unifies tasks that once required several tools. Instead of juggling pip
for installation, virtualenv
for isolation and setuptools
for packaging, Poetry brings these strands together and aims to make everyday development feel predictable and tidy. It sits in the same family of all-in-one managers as npm
for JavaScript and Cargo for Rust, offering a coherent workflow that spans dependency declaration, environment management and package publishing.
At the heart of Poetry is a simple idea: declare what a project needs in one place and let the tool do the orchestration. Projects describe their dependencies, development tools and metadata in a single configuration file, and Poetry ensures that what is installed on one machine can be replicated on another without nasty surprises. That reliability comes from the presence of a lock file. Once dependencies are resolved, their exact versions are recorded, so future installations repeat the same outcome. The intent here is not only convenience but determinism, helping teams avoid the "works on my machine" refrain that haunts software work.
- Core Concepts: Configuration and Lock Files
Two files do the heavy lifting. The pyproject.toml
file is where a project announces its name, version and description, as well as the dependencies required to run and to develop it. The poetry.lock
file captures the concrete resolution of those requirements at a particular moment. Together, they give you an auditable, repeatable picture of your environment. The structure of TOML keeps the configuration readable, and it spares developers from spreading equivalent settings across setup.cfg
, setup.py
and requirements.txt
. A minimal example shows how this looks in practice.
[tool.poetry]
name = "my_project"
version = "0.1.0"
description = "Example project using Poetry"
authors = ["John <john@example.com>"]
[tool.poetry.dependencies]
python = "^3.10"
requests = "^2.31.0"
[tool.poetry.dev-dependencies]
pytest = "^8.0.0"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
- Essential Commands
Working with Poetry day to day quickly becomes a matter of a few memorable commands. Initialising a project configuration starts with poetry init
, which steps through the creation of pyproject.toml
interactively. Adding a dependency is handled by poetry add
followed by the package name. Installing everything described in the configuration is done with poetry install
, which writes or updates the lock file. When it is time to refresh dependencies within permitted version ranges, poetry update re-resolves and updates what's installed. Removing a dependency is poetry remove
, followed by the package name. For environment management, poetry shell opens a shell inside the virtual environment managed by Poetry, and poetry run allows execution of commands within that same environment without entering a shell. Building distributions is as simple as poetry build, which produces a wheel and a source archive, and publishing to the Python Package Index is managed by poetry publish with credentials or an API token.
- Advantages and Considerations
There are clear advantages to taking this route. The dependency experience is simplified because you do not need to keep updating a requirements.txt
file by hand. With a lock file in place, environments are reproducible across developer machines and continuous integration runners, which stabilises builds and testing. Packaging is integrated rather than an extra chore, so producing and publishing a release becomes a repeatable process that sits naturally alongside development. Virtual environments are created and activated on demand, keeping projects isolated from one another with little ceremony. The configuration in TOML has the benefit of being structured and human-readable, which reduces the likelihood of configuration drift.
There are also points to consider before adopting Poetry. Projects that are deeply invested in setup.py or complex legacy build pipelines may need a clean migration to pyproject.toml
for avoiding clashes. Developers who prefer manual venv
and pip workflows can find Poetry opinionated at first because it expects to be responsible for the environment and dependency resolution. It is also designed with modern Python versions in mind, with examples here using Python 3.10.
- Migration from pip and requirements.txt
For teams arriving from pip
and requirements.txt
, moving to Poetry can be done in measured steps. The starting point is installation. Poetry provides an installer script that sets up the tool for your user account.
curl -sSL https://install.python-poetry.org | python3 -
If the installer does not add Poetry to your PATH, adding $HOME/.local/bin to PATH resolves that, after which poetry --version confirms the installation. From the root of your existing project, poetry init creates a new pyproject.toml
and invites you to provide metadata and dependencies. If you already maintain requirements.txt
files for production and development dependencies, Poetry can ingest those in one sweep. A single file can be imported with poetry add $(cat requirements.txt)
. Where development dependencies live in a separate file, they can be added into Poetry's dev group with poetry add --group dev $(cat dev-requirements.txt)
. Once added, Poetry resolves and pins exact versions, leaving a lock file behind to capture the resolution. After verifying that everything installs and tests pass, it becomes safe to retire earlier environment artefacts. Many teams remove requirements.txt
entirely if they plan to rely solely on Poetry, delete any Pipfile
and Pipfile.lock
remnants left by Pipenv and migrate metadata away from setup.py or setup.cfg in favour of pyproject.toml
. With that done, using the environment becomes routine. Opening a shell inside the virtual environment with poetry shell makes commands such as python
or pytest
use the isolated interpreter. If you prefer to avoid entering a shell, poetry run python script.py
or poetry run pytest
executes the command in the right context.
- Package Publishing
Publishing a package is one of the areas where Poetry streamlines the steps. Accurate metadata in pyproject.toml
is important, so name, version, description and other fields should be up-to-date. An example configuration shows commonly used fields.
[tool.poetry]
name = "example-package"
version = "1.0.0"
description = "A simple example package"
authors = ["John <john@example.com>"]
license = "MIT"
readme = "README.md"
homepage = "https://github.com/john/example-package"
repository = "https://github.com/john/example-package"
keywords = ["example", "poetry"]
With metadata set, building the distribution is handled by poetry build, which creates a dist directory containing a .tar.gz
source archive and a .whl
wheel file. Uploading to the official Python Package Index can be done with username and password, though API tokens are the recommended method because they can be scoped and revoked without affecting account credentials. Configuring a token is done once with poetry config pypi-token.pypi
, after which poetry publish will use it to upload. When testing a release before publishing for real, TestPyPI
provides a safer target. Poetry supports multiple sources and can be directed to use TestPyPI
by declaring it as a repository and then publishing to it.
[[tool.poetry.source]]
name = "testpypi"
url = "https://test.pypi.org/legacy/"
poetry publish -r testpypi
Once uploaded, it is sensible to confirm that the package can be installed in a clean environment using pip install example-package, which verifies that dependencies are correctly declared and wheels are intact.
- Continuous Integration with GitHub Actions
Beyond local steps, automation closes the loop. Adding a continuous integration workflow that installs dependencies, runs tests and publishes on a tagged release keeps quality checks and distribution consistent. GitHub Actions provides a hosted environment where Poetry can be installed quickly, dependencies cached and tests executed. A straightforward workflow listens for tags that begin with v
, such as v1.0.0
, then builds and publishes the package once tests pass. The workflow file sits under .github/workflows
and looks like this.
name: Publish to PyPI
on:
push:
tags:
- "v*"
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.10"
- name: Install Poetry
run: |
curl -sSL https://install.python-poetry.org | python3 -
echo "$HOME/.local/bin" >> $GITHUB_PATH
- name: Install dependencies
run: poetry install --no-interaction --no-root
- name: Run tests with pytest
run: poetry run pytest --maxfail=1 --disable-warnings -q
- name: Build package
run: poetry build
- name: Publish to PyPI
if: startsWith(github.ref, 'refs/tags/v')
env:
POETRY_PYPI_TOKEN_PYPI: ${{ secrets.PYPI_TOKEN }}
run: poetry publish --no-interaction --username __token__ --password $POETRY_PYPI_TOKEN_PYPI
This arrangement checks out the repository, installs a consistent Python version, brings in Poetry, installs dependencies based on the lock file, runs tests, builds distributions and only publishes when the workflow is triggered by a version tag. The API token used for publishing should be stored as a repository secret named PYPI_TOKEN
so it is not exposed in the codebase or logs. Creating the tag is done locally with git tag v1.0.0
followed by git push origin v1.0.0
, which triggers the workflow and results in a published package, moments later. It is often useful to extend this with a test matrix, so the suite runs across supported Python versions, as well as caching to speed up repeated runs by re-using Poetry and pip caches keyed on the lock file.
- Project Structure
Package structure is another place where Poetry encourages clarity. A simple, consistent layout makes maintenance and onboarding easier. A typical library keeps its importable code in a package directory named to match the project name in pyproject.toml
, with hyphens translated to underscores. Tests live in a separate tests directory, documentation in docs and examples in a directory of the same name. The repository root contains README.md
, a licence file, the lock file and a .gitignore
that excludes environment directories and build artefacts. The following tree illustrates a balanced structure for a data-oriented utility library.
data-utils/
├── data_utils/
│ ├── __init__.py
│ ├── core.py
│ ├── io.py
│ ├── analysis.py
│ └── cli.py
├── tests/
│ ├── __init__.py
│ ├── test_core.py
│ └── test_analysis.py
├── docs/
│ ├── index.md
│ └── usage.md
├── examples/
│ └── demo.ipynb
├── README.md
├── LICENSE
├── pyproject.toml
├── poetry.lock
└── .gitignore
Within the package directory, init.py can define a public interface and hide internal details. This allows users of the library to import the essentials without needing to know the module layout.
from .core import clean_data
from .analysis import summarise_data
__all__ = ["clean_data", "summarise_data"]
If the project offers a command-line interface, Poetry makes it simple to declare an entry point, so users can run a console command after installation. The scripts section in pyproject.toml
maps a command name to a callable, in this case the main function in a cli
module.
[tool.poetry.scripts]
data-utils = "data_utils.cli:main"
A basic CLI might be implemented using Click, passing arguments to internal functions and relaying progress.
import click
from data_utils import core
@click.command()
@click.argument("path")
def main(path):
"""Simple CLI example."""
print(f"Processing {path}...")
core.clean_data(path)
print("Done!")
if __name__ == "__main__":
main()
Git ignores should filter out files that do not belong in version control. A sensible default for a Poetry project is as follows.
__pycache__/
*.pyc
*.pyo
*.pyd
.env
.venv
dist/
build/
*.egg-info/
.cache/
.coverage
- Testing and Documentation
Testing sits comfortably alongside this. Many projects adopt pytest
because it is straightforward to use and integrates well with Poetry. Running tests through poetry run pytest
ensures the virtual environment is used, and a simple unit test demonstrates the pattern.
from data_utils.core import clean_data
def test_clean_data_removes_nulls():
data = [1, None, 2, None, 3]
cleaned = clean_data(data)
assert cleaned == [1, 2, 3]
Documentation can be kept in Markdown or built with tools. MkDocs
and Sphinx are common choices for generating websites from your docs, and both can be installed as development dependencies using Poetry. Including notebooks in an examples directory is helpful for illustrating usage in richer contexts, especially for data science libraries. The README should present the essentials succinctly, covering what the project does, how to install it, a short usage example and pointers for development setup. A licence file clarifies terms of use; MIT and Apache 2.0 are widely used options in open source.
- Advanced CI: Quality Checks and Multi-version Testing
Once structure, tests and documentation are in order, quality checks can be expanded in the continuous integration workflow. Adding automated formatting, import sorting and linting tightens consistency across contributions. An enhanced workflow uses Black, isort
and Flake8
before running tests and building, and also includes a matrix to test across multiple Python versions. It runs on pull requests as well as on tagged pushes, which means code quality and compatibility are verified before merging changes and again before publishing a release.
name: Lint, Test and Publish
on:
push:
tags:
- "v*"
pull_request:
jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.9", "3.10", "3.11"]
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install Poetry
run: |
curl -sSL https://install.python-poetry.org | python3 -
echo "$HOME/.local/bin" >> $GITHUB_PATH
- name: Cache Poetry dependencies
uses: actions/cache@v4
with:
path: |
~/.cache/pypoetry
~/.cache/pip
key: poetry-${{ runner.os }}-${{ hashFiles('**/poetry.lock') }}
restore-keys: |
poetry-${{ runner.os }}-
- name: Install dependencies
run: poetry install --no-interaction --no-root
- name: Check code formatting with Black
run: poetry run black --check .
- name: Check import order with isort
run: poetry run isort --check-only .
- name: Run Flake8 linting
run: poetry run flake8 .
- name: Run tests with pytest
run: poetry run pytest --maxfail=1 --disable-warnings -q
- name: Build package
run: poetry build
- name: Publish to PyPI
if: startsWith(github.ref, 'refs/tags/v')
env:
POETRY_PYPI_TOKEN_PYPI: ${{ secrets.PYPI_TOKEN }}
run: poetry publish --no-interaction --username __token__ --password $POETRY_PYPI_TOKEN_PYPI
This workflow builds on the earlier one by checking style and formatting before tests. If any of those checks fail, the process stops and surfaces the problems in the job logs. Caching based on the lock file reduces the time spent installing dependencies by reusing packages where nothing has changed. The matrix section ensures that the library remains compatible with the declared range of Python versions, which is especially helpful just before a release. It is possible to extend this further with coverage reports using pytest-cov
and Codecov
, static type checking with mypy
, or pre-commit hooks to keep local development consistent with continuous integration. Publishing to TestPyPI
in a separate job can help validate packaging without affecting the real index, and once outcomes look good, the main publishing step proceeds when a tag is pushed.
- Conclusion
The result of adopting Poetry is a project that states its requirements clearly, installs them reliably and produces distributions without ceremony. For new work, it removes much of the friction that once accompanied Python packaging. For existing projects, the migration path is gentle and reversible, and the gains in determinism often show up quickly in fewer environment-related issues. When paired with a small amount of automation in a continuous integration system, the routine of building, testing and publishing becomes repeatable and visible to everyone on the team. That holds whether the package is destined for internal use on a private index or a public release on PyPI.
Python productivity: Building better code through design, performance and scale
12th September 2025Python's success in data science and beyond stems from more than just readable syntax. It represents a coherent philosophy where errors guide development, explicitness prevents bugs, modern tooling enforces quality, performance comes from purpose-built engines, and scaling extends rather than replaces familiar patterns. Understanding these principles transforms everyday coding from a series of individual tasks into a systematic approach to building robust, maintainable and efficient systems.
- Error-Driven Development as a Design Philosophy
Python treats errors not as failures, but as design features that surface problems early and prevent subtle defects later. The language embodies an "easier to ask forgiveness than permission" philosophy, attempting operations first and objecting meaningfully when they cannot proceed.
Consider how Python handles basic operations. A SyntaxError
appears immediately when code violates grammatical rules: if True print("hello")
triggers an immediate complaint with a caret pointing to the problematic location. Python neither guesses intentions nor continues with broken syntax because this guarantee of clear structure keeps code understandable across projects and platforms.
Sequence operations demonstrate similar principles. When code attempts to access lst[5]
on a three-element list, Python raises IndexError: list index out of range
rather than silently padding or expanding the sequence. This deliberate failure prevents hidden logic errors in loops and aggregations by forcing explicit checks of assumptions about data size.
Dictionary lookups follow the same pattern. Accessing a non-existent key with d['missing']
yields KeyError: 'missing'
rather than inventing placeholder values. This explicit failure catches typos and unclear control flow whilst enabling defensive programming patterns through try
/except
blocks.
Name resolution errors like NameError
and UnboundLocalError
enforce clear scoping rules without creating variables accidentally or resolving names to unexpected contexts. Type discipline appears at runtime through TypeError
for incorrect argument types and ValueError
for correct types with inappropriate values. Each error message identifies which contract has been violated, directing fixes to either the object passed or the value it contains.
Assertions provide a final layer of optional verification. The assert
statement allows code to state assumptions explicitly, failing with meaningful messages when invariants do not hold. This narrows the search space for defects by making expectations visible and providing immediate context for failures.
Taking these error signals seriously nudges development towards explicitness and clarity, establishing a foundation for all subsequent quality improvements.
- Explicitness Over Implicitness
Making intentions clear through code structure prevents ambiguity, aids tooling and simplifies reuse. This principle manifests across multiple areas of Python development, from data structures to function signatures.
Raw dictionaries offer flexibility but create fragility. A typo in a key or missing field becomes a runtime KeyError
with no contract about required contents. Using @dataclass
to define structured objects like User
with id
, email
, full_name
, status
and optional last_login
provides clear interfaces with minimal overhead. Type hints and IDE support make attribute access unambiguous, whilst construction fails early when required fields are absent.
For cases requiring validation, pydantic
models build on this foundation. An email field declared as EmailStr
automatically validates format, while custom validators can restrict status
values to specific options such as 'active', 'inactive' or 'pending'. The resulting models are self-documenting and shield downstream code from invalid data.
Function parameters representing closed sets of options benefit from similar treatment. Plain strings invite typos and lack autocomplete support. Defining enums
such as OrderStatus
with PENDING
, SHIPPED
and DELIVERED
makes possible states explicit whilst helping both developers and tools. Passing OrderStatus.SHIPPED
to process_order
reveals intention clearly and enables straightforward comparisons against enum members.
Function signatures become clearer through keyword-only arguments, enforced with a bare star in definitions. A function like create_user(name, email, *, admin=False, notify=True, temporary=False)
forces call sites to write create_user(..., admin=True, notify=False)
rather than passing sequences of ambiguous boolean values. The resulting calls read almost as documentation.
File path operations improve through object-oriented design. The pathlib
module treats paths as objects where joining uses natural /
syntax, directory creation uses mkdir
, suffix changes use with_suffix
, and text operations use read_text
and write_text
. Code becomes shorter, more portable and less prone to string manipulation errors.
These patterns consistently replace implicit assumptions with explicit contracts, making code intention more visible and reducing the cognitive load of understanding system behaviour.
- Structural Code Quality Through Tooling and Patterns
Sustainable code quality emerges from systematic approaches to organisation, testing and maintenance rather than individual discipline alone. Several key patterns and tools work together to create robust, readable codebases.
Control flow benefits from handling error conditions early rather than nesting deeply. Guard clauses invert the traditional structure so that invalid states return immediately, whilst main logic remains non-indented when preconditions are met. A process_payment
function checking order.is_valid
, then user.has_payment_method
, then available funds before performing charges reads linearly. Exceptions during processing are caught precisely, errors logged with context, and functions return deterministically.
Even beloved list comprehensions have limits. When filtering and transformation logic become complex, sprawling comprehensions become opaque. Extracting predicates into named functions like is_valid_premium_user
restores readability by giving conditions clear names. Where multiple checks and transformations are needed, conventional loops with early continue
statements may prove more straightforward and debuggable.
Pure functions that accept all inputs as parameters and return results without changing external state simplify testing and reuse. Moving from designs where functions mutate global totals and read from global inventories to approaches where calculations accept prices, quantities and discounts as inputs removes hidden coupling. This enables deterministic testing of edge cases and reasoning about code without tracking changing state.
Documentation ties these practices together. Docstrings explaining what functions do, parameters they accept, values they return and including examples make codebases self-explanatory. Combined with tooling, docstrings serve as both reference and executable documentation.
Automation enforces consistency where human attention falters. Formatters like Black, linters like Ruff, static type checkers like mypy
and import organisers like isort
can run before each commit using pre-commit
. Style issues and common mistakes are caught automatically, freeing mental capacity for higher-level concerns.
When handling errors, resist blanket except:
statements that swallow everything from syntax errors to keyboard interrupts. Be specific where possible, catching ConnectionError
, ValueError
or database errors and handling each appropriately. When catch-alls are necessary, prefer except Exception as e:
and log full tracebacks so that unexpected failures remain visible and traceable.
- Performance Through Modern Engines
Once code achieves cleanliness and robustness, performance becomes the next frontier. Traditional tools often leave substantial speed gains on the table, particularly for data-intensive work where single-threaded processing creates bottlenecks on modern hardware.
Polars, a DataFrame library written in Rust, addresses these limitations by making parallelism the default whilst providing both eager and lazy execution modes. Benchmarks on datasets of around 580,000 rows show Polars completing filtering roughly four times faster than Pandas, aggregation over twenty times faster, groupby
operations eight times faster, sorting three times faster, and feature engineering five times faster. These gains stem from fundamental architectural differences rather than incremental optimisations.
The performance improvement requires a shift in mental model. Instead of writing sequential operations that execute immediately, you can batch expressions and let Polars parallelise them automatically. Creating both profit
and margin
with one with_columns
call signals that these calculations can proceed together. Lazy evaluation extends this approach further. Building pipelines with pl.scan_csv('large_file.csv').filter(...).group_by(...).agg(...).collect()
lets Polars construct query plans, then optimises them before execution. Filters are pushed down so less data reaches later stages, only selected columns are read, and compatible operations are combined.
Expressiveness comes from an expression system applying operations across columns succinctly. Where Pandas encourages thinking in terms of single columns assigned individually, Polars supports expressions like pl.col(['revenue', 'cost']) * 1.1
applied to multiple columns simultaneously. Familiar transformations translate directly: pl.read_csv('sales.csv')
replaces pd.read_csv
, selection and filtering become df.filter(pl.col('order_value') > 500).select(['customer_id', 'order_value'])
, new columns are created with df.with_columns(((pl.col('revenue') - pl.col('cost')) / pl.col('revenue')).alias('profit_margin'))
, and operations utilise all available cores automatically.
Memory efficiency improves through Apache Arrow's columnar format, storing data more compactly and avoiding NumPy-based overhead. CSV files of around 2 GB requiring roughly 10 GB of RAM in Pandas often process in approximately 4 GB with Polars. This difference can determine whether workflows run smoothly on laptops or require chunking strategies.
- Scaling Beyond Single Processes
When single processes reach their limits, two prominent approaches help scale Python across cores and machines whilst preserving familiar patterns and mental models.
Dask extends NumPy, Pandas and scikit-learn idioms to larger-than-memory datasets by partitioning arrays, DataFrames and computations then scheduling them in parallel. Its primary abstractions are dask.dataframe
and dask.array
, along with delayed task graphs. It excels for scalable batch processing, feature engineering and out-of-core work where the mental model remains close to the PyData
stack. Integration with scikit-learn and XGBoost is mature, work-stealing schedulers are sophisticated, and detailed dashboards provide visibility. Clusters can be managed natively or through systems like Kubernetes and YARN.
For large-scale data cleaning and feature engineering, Dask provides natural extensions. Reading many CSV files from storage with dd.read_csv('s3://data/large-dataset-*.csv')
, filtering rows with df[df['amount'] > 100]
, applying transformations per partition, then writing Parquet with df.to_parquet('s3://processed/output/')
looks like Pandas but runs in parallel and out of core. Array computations through dask.array
handle chunked operations so that x.mean(axis=0).compute()
runs across partitions without exhausting memory.
Ray takes a more general approach to distributed computing through remote functions and actors. It suits workloads with many independent Python functions, stateful services and complex machine learning pipelines. A growing ecosystem includes Ray Tune for hyperparameter optimisation, Ray Train for multi-GPU training, Ray Serve for model serving, and RLlib
for reinforcement learning. Scheduling is dynamic and actor-based, cluster management integrates with cloud providers, and scalability handles applications requiring control and flexibility.
For model training requiring many configuration explorations, Ray Tune provides schedulers and search strategies. Training functions can be wrapped and launched across workers with tune.run
, with methods like ASHA stopping unpromising runs early. Integration with popular libraries means scaling experiments requires minimal code changes. Ray Serve turns model classes exposing __call__
methods into scalable services with @serve.deployment
and serve.run
, handling routing and scaling automatically.
- Incremental Adoption and Pragmatic Choices
The most sustainable approach to improving Python productivity involves gradual implementation rather than wholesale changes. Each improvement builds on previous ones, creating compound benefits over time whilst minimising disruption to existing workflows.
Adopting Polars illustrates this principle well. The first step can be simply loading data with pl.read_csv('big_file.csv')
for faster I/O, then converting to Pandas with .to_pandas()
if the rest of the pipeline expects Pandas objects. As comfort grows, expression-oriented patterns yield dividends: filtering then adding multiple columns in single chained calls, so Polars can optimise across steps. Full benefits appear when entire pipelines are expressed lazily, but this transition can happen gradually as understanding deepens.
Similarly, clean code practices can be introduced incrementally. Start by letting error messages guide fixes rather than suppressing them. Refactor one fragile dictionary into a dataclass when maintenance becomes painful. Extract a complex list comprehension into a named function when debugging becomes difficult. Each change teaches principles that apply more broadly whilst delivering immediate benefits.
Scaling decisions are often pragmatic rather than theoretical. If work centres on DataFrames and arrays with minimal conceptual shift from Pandas or NumPy, Dask likely delivers what you need. If workloads mix training, tuning and serving or require orchestrating many concurrent Python tasks with fine-grained control, Ray's abstractions and libraries provide better matches. Trying each approach on representative workflow slices quickly clarifies which will serve best.
The choice between tools should be driven by actual requirements rather than perceived sophistication. A single machine with Polars may outperform a small cluster running Pandas. A well-structured monolithic application may be more maintainable than a prematurely distributed system. The key is understanding when complexity serves genuine needs rather than adding overhead.
- Synthesis: The Compound Nature of Python Productivity
These themes work together rather than in isolation. Error-driven development creates habits that surface problems early. Explicit code structures make intentions clear to both humans and tools. Quality practices through tooling and patterns create sustainable foundations. Modern engines provide performance without sacrificing readability. Scaling approaches extend familiar patterns rather than replacing them. Incremental adoption ensures changes compound rather than disrupt.
The result is a coherent approach to Python development where each improvement reinforces others. Explicit data structures work better with static type checkers. Pure functions are easier to test and parallelise. Clean error handling integrates naturally with distributed systems. Modern DataFrame engines benefit from lazy evaluation patterns that also improve code clarity.
This synthesis explains Python's enduring appeal in data science and beyond. The language welcomes beginners with approachable syntax, whilst scaling to demanding production work without losing clarity. The ecosystem encourages practices that speed up teams over time rather than optimising for immediate gratification. The same principles that guide small scripts apply to large systems, creating a path for continuous improvement rather than periodic rewrites.
Start small: let error messages guide one fix, refactor one fragile dictionary into a dataclass, switch one slow operation to Polars, or run one hyperparameter sweep with Ray Tune. The improvements compound, and the foundations established early enable sophisticated capabilities later without fundamental changes to approach or mindset.
PandasGUI: A simple solution for Pandas DataFrame inspection from within VSCode
2nd September 2025One of the things that I miss about Spyder when running Python scripts is the ability to look at DataFrames easily. Recently, I was checking a VAT return only for tmux to truncate how much of the DataFrame I could see in output from the print function. While closing tmux might have been an idea, I sought the DataFrame windowing alternative. That led me to the pandasgui
package, which did exactly what I needed, apart from pausing the script execution to show me the data. The installed was done using pip:
pip install pandasgui
Once that competed, I could use the following code construct to accomplish what I wanted:
import pandasgui
pandasgui.show(df)
In my case, there were several lines between the two lines above. Nevertheless, the first line made the pandasgui
package available to the script, while the second one displayed the DataFrame in a GUI with scrollbars and cells, among other things. That was close enough to what I wanted to leave me able to complete the task that was needed of me.
Fixing Python path issues after Homebrew updates on Linux Mint
30th August 2025With Python available by default, it is worth asking how the version on my main Linux workstation is made available courtesy of Homebrew. All that I suggest is that it either was needed by something else or I fancied having a newer version that was available through the Linux Mint repos. Regardless of the now vague reason for doing so, it meant that I had some work to do after running the following command to update and upgrade all my Homebrew packages:
brew update; brew upgrade
The first result was this message when I tried running a Python script afterwards:
-bash: /home/linuxbrew/.linuxbrew/bin/python3: No such file or directory
The solution was to issue the following command to re-link Python:
brew link --overwrite python@3.13
Since you may have a different version by the time that you read this, just change 3.13 above to whatever you have on your system. All was not quite sorted for me after that, though.
My next task was to make Pylance look in the right place for Python packages because they had been moved too. Initial inquiries were suggesting complex if robust solutions. Instead, I went for a simpler fix. The first step was to navigate to File > Preferences > Settings in the menus. Then, I sought out the Open Settings (JSON) icon in the top right of the interface and clicked on it to open a JSON containing VSCode settings. Once in there, I edited the file to end up with something like this:
"python.analysis.extraPaths": [
"/home/[account name]/.local/bin",
"/home/[account name]/.local/lib/python[python version]/site-packages"
]
Clearly, your [account name] and [python version] need to be filled in above. That approach works for me so far, leaving the more complex alternative for later should I come to need that.
Claude Projects: Reusing your favourite AI prompts
28th March 2025Some things that I do with Anthropic Claude, I end up repeating. Generating titles for pieces of text or rewriting text to make it read better are activities that happen a lot. Others would include the generation of single word previews for a piece or creating a summary.
Python or R scripts come in handy for summarisation, either for a social media post or for introduction into other content. In fact, this is how I go much of the time. Nevertheless, I found another option: using Projects in the Claude web interface.
These allow you to store a prompt that you reuse a lot in the Project Knowledge panel. Otherwise, you need to supply a title and a description too. Once completed, you just add your text in there for the AI to do the rest. Title generation and text rewriting already are set up like this, and keywords could follow. It is a great way to reuse and refine prompts that you use a lot.
Dealing with this Python error message on Windows: UnicodeEncodeError: 'charmap' codec can't encode characters in position 56-57: character maps to <undefined>
14th March 2025Recently, I got caught out by the above message when summarising some text using Python and Open AI's API while working within VS Code. There was no problem on Linux or macOS, but it was triggered on the Windows command line from within VS Code. Unlike the Julia or R REPL's, everything in Python gets executed in the console like this:
& "C:/Program Files/Python313/python.exe" script.py
The Windows command line shell operated with cp1252 character encoding, and that was tripping up the code like the following:
with open("out.txt", "w") as file:
file.write(new_text)
The cure was to specify the encoding of the output text as utf-8:
with open("out.txt", "w", encoding='utf-8') as file:
file.write(new_text)
After that, all was well and text was written to a file like in the other operating systems. One other thing to note is that the use of backslashes in file paths is another gotcha. Adding an r before the quotes gets around this to escape the contents, like using double backslashes. Using forward slashes is another option.
with open(r"c:\temp\out.txt", "w", encoding='utf-8') as file:
file.write(new_text)
Getting Pylance to recognise locally installed packages in VSCode running on Linux Mint
17th December 2024When using VSCode on Linux Mint, I noticed that it was not finding any Python package installed into my user area, as is my usual way of working. Thus, it was being highlighted as being missing when it was already there.
The solution was to navigate to File > Preferences > Settings and click on the Open Settings (JSON) icon in the top right-hand corner of the app to add something like the following snippet in there.
"python.analysis.extraPaths": [
"/home/[user]/.local/bin"
]
Once you have added your user account to the above, saving the file is the next step. Then, VSCode needs a restart to pick up the new location. After that, the false positives get eliminated.
What to do an error appears when using pip to install Python packages on Linux Mint 22
16th December 2024After upgrading to Linux Mint 22, the following message appeared when attempting to install Python packages using the pip
command:
error: externally-managed-environment
× This environment is externally managed
╰─> To install Python packages system-wide, try apt install
python3-xyz, where xyz is the package you are trying to
install.
If you wish to install a non-Debian-packaged Python package,
create a virtual environment using python3 -m venv path/to/venv.
Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
sure you have python3-full installed.
If you wish to install a non-Debian packaged Python application,
it may be easiest to use pipx install xyz, which will manage a
virtual environment for you. Make sure you have pipx installed.
See /usr/share/doc/python3.12/README.venv for more information.
note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.
This will frustrate anyone following how-tos on the web, so users will need to know about it. On something like Linux Mint, the repositories may not be as up-to-date as PyPI, so picking up the very latest version has its advantages. Thus, I initially used the unrecommended --break-system-packages
switch to get things going as before, since doing never broke anything before. While the way of working feels like an overkill in some ways, using pipx
probably is the way forward as long as things work as I want them to do.
There is wisdom in using virtual environments too, especially when AI models are involved. For most of what I get to do, that may be getting too elaborate. Then, deleting or renaming the message file in /usr/lib/python3.12/EXTERNALLY-MANAGED
is tempting if that gets around things, as retrograde as that probably is. After all, I never broke anything before this message started to appear, possibly since my interests are data related.
AttributeError: module 'PIL' has no attribute 'Image'
11th March 2024One of my websites has an online photo gallery. This has been a long-term activity that has taken several forms over the years. Once HTML and JavaScript based, it then was powered by Perl before PHP and MySQL came along to take things from there.
While that remains how it works, the publishing side of things has used its own selection of mechanisms over the same time span. Perl and XML were the backbone until Python and Markdown took over. There was a time when ImageMagick and GraphicsMagick handled image processing, but Python now does that as well.
That was when the error message gracing the title of this post came to my notice. Everything was working well when executed in Spyder, but the message appears when I tried running things using Python on the command line. PIL is the abbreviated name for the Python 3 pillow package; there was one called PIL in the Python 2 days.
For me, pillow loads, resizes and creates new images, which is handy for adding borders and copyright/source information to each image as well as creating thumbnails. All this happens in memory and that makes everything go quickly, much faster than disk-based tools like ImageMagick and GraphicsMagick.
Of course, nothing is going to happen if the package cannot be loaded, and that is what the error message is about. Linux is what I mainly use, so that is the context for this scenario. What I was doing was something like the following in the Python script:
import PIL
Then, I referred to PIL.Image
when I needed it, and this could not be found when the script was run from the command line (BASH). The solution was to add something like the following:
from PIL import Image
That sorted it, and I must have run into trouble with PIL.ImageFilter
too, since I now load it in the same manner. In both cases, I could just refer to Image or ImageFilter as I required and without the dot syntax. However, you need to make sure that there is no clash with anything in another loaded Python package when doing this.
Resolving a clash between Homebrew and Python
22nd November 2022For reasons that I cannot recall now, I installed the Hugo static website generator on my Linux system and web servers using Homebrew. The only reason that I suggest is that it might have been a way to get the latest version at the time because Linux Mint only does major changes like that every two years, keeping it in line with long-term support editions of Ubuntu.
When Homebrew was installed, it changed the lookup path for command line executables by adding the following line to my .bashrc
file:
eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)"
This executed the following lines:
export HOMEBREW_PREFIX="/home/linuxbrew/.linuxbrew";
export HOMEBREW_CELLAR="/home/linuxbrew/.linuxbrew/Cellar";
export HOMEBREW_REPOSITORY="/home/linuxbrew/.linuxbrew/Homebrew";
export PATH="/home/linuxbrew/.linuxbrew/bin:/home/linuxbrew/.linuxbrew/sbin${PATH+:$PATH}";
export MANPATH="/home/linuxbrew/.linuxbrew/share/man${MANPATH+:$MANPATH}:";
export INFOPATH="/home/linuxbrew/.linuxbrew/share/info:${INFOPATH:-}";
While the result suits Homebrew, it changed the setup of Python and its packages on my system. Eventually, this had undesirable consequences, like messing up how Spyder started, so I wanted to change this. There are other things that I have automated using Python and these were not working either.
One way that I have seen suggested is to execute the following command, but I cannot vouch for this:
brew unlink python
What I did was to comment out the offending line in .bashrc
and replace it with the following:
export PATH="$PATH:/home/linuxbrew/.linuxbrew/bin:/home/linuxbrew/.linuxbrew/sbin"
export HOMEBREW_PREFIX="/home/linuxbrew/.linuxbrew";
export HOMEBREW_CELLAR="/home/linuxbrew/.linuxbrew/Cellar";
export HOMEBREW_REPOSITORY="/home/linuxbrew/.linuxbrew/Homebrew";
export MANPATH="/home/linuxbrew/.linuxbrew/share/man${MANPATH+:$MANPATH}:";
export INFOPATH="${INFOPATH:-}/home/linuxbrew/.linuxbrew/share/info";
The first command adds Homebrew paths to the end of the PATH variable rather than the beginning, which was the previous arrangement. This ensures system folders are searched for executable files before Homebrew folders. It also means Python packages load from my user area instead of the Homebrew location, which happened under Homebrew's default configuration. When working with Python packages, remember not to install one version at the system level and another in your user area, as this creates conflicts.
So far, the result of the Homebrew changes is not unsatisfactory, and I will watch for any rough edges that need addressing. If something comes up, then I will set things up in another way.