Pip | Technology Tales

TOPIC: PIP

Managing Python projects with Poetry

4^th October 2025

Python Poetry has become a popular choice for managing Python projects because it unifies tasks that once required several tools. Instead of juggling pip for installation, virtualenv for isolation and setuptools for packaging, Poetry brings these strands together and aims to make everyday development feel predictable and tidy. It sits in the same family of all-in-one managers as npm for JavaScript and Cargo for Rust, offering a coherent workflow that spans dependency declaration, environment management and package publishing.

At the heart of Poetry is a simple idea: declare what a project needs in one place and let the tool do the orchestration. Projects describe their dependencies, development tools and metadata in a single configuration file, and Poetry ensures that what is installed on one machine can be replicated on another without nasty surprises. That reliability comes from the presence of a lock file. Once dependencies are resolved, their exact versions are recorded, so future installations repeat the same outcome. The intent here is not only convenience but determinism, helping teams avoid the "works on my machine" refrain that haunts software work.

Core Concepts: Configuration and Lock Files

Two files do the heavy lifting. The pyproject.toml file is where a project announces its name, version and description, as well as the dependencies required to run and to develop it. The poetry.lock file captures the concrete resolution of those requirements at a particular moment. Together, they give you an auditable, repeatable picture of your environment. The structure of TOML keeps the configuration readable, and it spares developers from spreading equivalent settings across setup.cfg, setup.py and requirements.txt. A minimal example shows how this looks in practice.

[tool.poetry]
name = "my_project"
version = "0.1.0"
description = "Example project using Poetry"
authors = ["John <john@example.com>"]

[tool.poetry.dependencies]
python = "^3.10"
requests = "^2.31.0"

[tool.poetry.dev-dependencies]
pytest = "^8.0.0"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

Essential Commands

Working with Poetry day to day quickly becomes a matter of a few memorable commands. Initialising a project configuration starts with poetry init, which steps through the creation of pyproject.toml interactively. Adding a dependency is handled by poetry add followed by the package name. Installing everything described in the configuration is done with poetry install, which writes or updates the lock file. When it is time to refresh dependencies within permitted version ranges, poetry update re-resolves and updates what's installed. Removing a dependency is poetry remove, followed by the package name. For environment management, poetry shell opens a shell inside the virtual environment managed by Poetry, and poetry run allows execution of commands within that same environment without entering a shell. Building distributions is as simple as poetry build, which produces a wheel and a source archive, and publishing to the Python Package Index is managed by poetry publish with credentials or an API token.

Advantages and Considerations

There are clear advantages to taking this route. The dependency experience is simplified because you do not need to keep updating a requirements.txt file by hand. With a lock file in place, environments are reproducible across developer machines and continuous integration runners, which stabilises builds and testing. Packaging is integrated rather than an extra chore, so producing and publishing a release becomes a repeatable process that sits naturally alongside development. Virtual environments are created and activated on demand, keeping projects isolated from one another with little ceremony. The configuration in TOML has the benefit of being structured and human-readable, which reduces the likelihood of configuration drift.

There are also points to consider before adopting Poetry. Projects that are deeply invested in setup.py or complex legacy build pipelines may need a clean migration to pyproject.toml for avoiding clashes. Developers who prefer manual venv and pip workflows can find Poetry opinionated at first because it expects to be responsible for the environment and dependency resolution. It is also designed with modern Python versions in mind, with examples here using Python 3.10.

Migration from pip and requirements.txt

For teams arriving from pip and requirements.txt, moving to Poetry can be done in measured steps. The starting point is installation. Poetry provides an installer script that sets up the tool for your user account.

curl -sSL https://install.python-poetry.org | python3 -

If the installer does not add Poetry to your PATH, adding $HOME/.local/bin to PATH resolves that, after which poetry --version confirms the installation. From the root of your existing project, poetry init creates a new pyproject.toml and invites you to provide metadata and dependencies. If you already maintain requirements.txt files for production and development dependencies, Poetry can ingest those in one sweep. A single file can be imported with poetry add $(cat requirements.txt). Where development dependencies live in a separate file, they can be added into Poetry's dev group with poetry add --group dev $(cat dev-requirements.txt). Once added, Poetry resolves and pins exact versions, leaving a lock file behind to capture the resolution. After verifying that everything installs and tests pass, it becomes safe to retire earlier environment artefacts. Many teams remove requirements.txt entirely if they plan to rely solely on Poetry, deleting any remnants of Pipfile and Pipfile.lock that were left by Pipenv and migrate metadata away from setup.py or setup.cfg in favour of pyproject.toml. With that done, using the environment becomes routine. Opening a shell inside the virtual environment with poetry shell makes commands such as python or pytest use the isolated interpreter. If you prefer to avoid entering a shell, poetry run python script.py or poetry run pytest executes the command in the right context.

Package Publishing

Publishing a package is one of the areas where Poetry streamlines the steps. Accurate metadata in pyproject.toml is important, so name, version, description and other fields should be up-to-date. An example configuration shows commonly used fields.

[tool.poetry]
name = "example-package"
version = "1.0.0"
description = "A simple example package"
authors = ["John <john@example.com>"]
license = "MIT"
readme = "README.md"
homepage = "https://github.com/john/example-package"
repository = "https://github.com/john/example-package"
keywords = ["example", "poetry"]

With metadata set, building the distribution is handled by poetry build, which creates a dist directory containing a .tar.gz source archive and a .whl wheel file. Uploading to the official Python Package Index can be done with username and password, though API tokens are the recommended method because they can be scoped and revoked without affecting account credentials. Configuring a token is done once with poetry config pypi-token.pypi, after which poetry publish will use it to upload. When testing a release before publishing for real, TestPyPI provides a safer target. Poetry supports multiple sources and can be directed to use TestPyPI by declaring it as a repository and then publishing to it.

[[tool.poetry.source]]
name = "testpypi"
url = "https://test.pypi.org/legacy/"

poetry publish -r testpypi

Once uploaded, it is sensible to confirm that the package can be installed in a clean environment using pip install example-package, which verifies that dependencies are correctly declared and wheels are intact.

Continuous Integration with GitHub Actions

Beyond local steps, automation closes the loop. Adding a continuous integration workflow that installs dependencies, runs tests and publishes on a tagged release keeps quality checks and distribution consistent. GitHub Actions provides a hosted environment where Poetry can be installed quickly, dependencies cached and tests executed. A straightforward workflow listens for tags that begin with v, such as v1.0.0, then builds and publishes the package once tests pass. The workflow file sits under .github/workflows and looks like this.

name: Publish to PyPI

on:
  push:
    tags:
      - "v*"

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - name: Check out repository
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.10"

      - name: Install Poetry
        run: |
          curl -sSL https://install.python-poetry.org | python3 -
          echo "$HOME/.local/bin" >> $GITHUB_PATH

      - name: Install dependencies
        run: poetry install --no-interaction --no-root

      - name: Run tests with pytest
        run: poetry run pytest --maxfail=1 --disable-warnings -q

      - name: Build package
        run: poetry build

      - name: Publish to PyPI
        if: startsWith(github.ref, 'refs/tags/v')
        env:
          POETRY_PYPI_TOKEN_PYPI: ${{ secrets.PYPI_TOKEN }}
        run: poetry publish --no-interaction --username __token__ --password $POETRY_PYPI_TOKEN_PYPI

This arrangement checks out the repository, installs a consistent Python version, brings in Poetry, installs dependencies based on the lock file, runs tests, builds distributions and only publishes when the workflow is triggered by a version tag. The API token used for publishing should be stored as a repository secret named PYPI_TOKEN so it is not exposed in the codebase or logs. Creating the tag is done locally with git tag v1.0.0 followed by git push origin v1.0.0, which triggers the workflow and results in a published package, moments later. It is often useful to extend this with a test matrix, so the suite runs across supported Python versions, as well as caching to speed up repeated runs by re-using Poetry and pip caches keyed on the lock file.

Project Structure

Package structure is another place where Poetry encourages clarity. A simple, consistent layout makes maintenance and onboarding easier. A typical library keeps its importable code in a package directory named to match the project name in pyproject.toml, with hyphens translated to underscores. Tests live in a separate tests directory, documentation in docs and examples in a directory of the same name. The repository root contains README.md, a licence file, the lock file and a .gitignore that excludes environment directories and build artefacts. The following tree illustrates a balanced structure for a data-oriented utility library.

data-utils/
├── data_utils/
│   ├── __init__.py
│   ├── core.py
│   ├── io.py
│   ├── analysis.py
│   └── cli.py
├── tests/
│   ├── __init__.py
│   ├── test_core.py
│   └── test_analysis.py
├── docs/
│   ├── index.md
│   └── usage.md
├── examples/
│   └── demo.ipynb
├── README.md
├── LICENSE
├── pyproject.toml
├── poetry.lock
└── .gitignore

Within the package directory, init.py can define a public interface and hide internal details. This allows users of the library to import the essentials without needing to know the module layout.

from .core import clean_data
from .analysis import summarise_data

__all__ = ["clean_data", "summarise_data"]

If the project offers a command-line interface, Poetry makes it simple to declare an entry point, so users can run a console command after installation. The scripts section in pyproject.toml maps a command name to a callable, in this case the main function in a cli module.

[tool.poetry.scripts]
data-utils = "data_utils.cli:main"

A basic CLI might be implemented using Click, passing arguments to internal functions and relaying progress.

import click
from data_utils import core

@click.command()
@click.argument("path")
def main(path):
    """Simple CLI example."""
    print(f"Processing {path}...")
    core.clean_data(path)
    print("Done!")

if __name__ == "__main__":
    main()

Git ignores should filter out files that do not belong in version control. A sensible default for a Poetry project is as follows.

__pycache__/
*.pyc
*.pyo
*.pyd
.env
.venv
dist/
build/
*.egg-info/
.cache/
.coverage

- Testing and Documentation

Testing sits comfortably alongside this. Many projects adopt pytest because it is straightforward to use and integrates well with Poetry. Running tests through poetry run pytest ensures the virtual environment is used, and a simple unit test demonstrates the pattern.

from data_utils.core import clean_data

def test_clean_data_removes_nulls():
    data = [1, None, 2, None, 3]
    cleaned = clean_data(data)
    assert cleaned == [1, 2, 3]

Documentation can be kept in Markdown or built with tools. MkDocs and Sphinx are common choices for generating websites from your docs, and both can be installed as development dependencies using Poetry. Including notebooks in an examples directory is helpful for illustrating usage in richer contexts, especially for data science libraries. The README should present the essentials succinctly, covering what the project does, how to install it, a short usage example and pointers for development setup. A licence file clarifies terms of use; MIT and Apache 2.0 are widely used options in open source. Advanced CI: Quality Checks and Multi-version Testing

Once structure, tests and documentation are in order, quality checks can be expanded in the continuous integration workflow. Adding automated formatting, import sorting and linting tightens consistency across contributions. An enhanced workflow uses Black, isort and Flake8 before running tests and building, and also includes a matrix to test across multiple Python versions. It runs on pull requests as well as on tagged pushes, which means code quality and compatibility are verified before merging changes and again before publishing a release.

name: Lint, Test and Publish

on:
  push:
    tags:
      - "v*"
  pull_request:

jobs:
  build:
    runs-on: ubuntu-latest

    strategy:
      matrix:
        python-version: ["3.9", "3.10", "3.11"]

    steps:
      - name: Check out repository
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}

      - name: Install Poetry
        run: |
          curl -sSL https://install.python-poetry.org | python3 -
          echo "$HOME/.local/bin" >> $GITHUB_PATH

      - name: Cache Poetry dependencies
        uses: actions/cache@v4
        with:
          path: |
            ~/.cache/pypoetry
            ~/.cache/pip
          key: poetry-${{ runner.os }}-${{ hashFiles('**/poetry.lock') }}
          restore-keys: |
            poetry-${{ runner.os }}-

      - name: Install dependencies
        run: poetry install --no-interaction --no-root

      - name: Check code formatting with Black
        run: poetry run black --check .

      - name: Check import order with isort
        run: poetry run isort --check-only .

      - name: Run Flake8 linting
        run: poetry run flake8 .

      - name: Run tests with pytest
        run: poetry run pytest --maxfail=1 --disable-warnings -q

      - name: Build package
        run: poetry build

      - name: Publish to PyPI
        if: startsWith(github.ref, 'refs/tags/v')
        env:
          POETRY_PYPI_TOKEN_PYPI: ${{ secrets.PYPI_TOKEN }}
        run: poetry publish --no-interaction --username __token__ --password $POETRY_PYPI_TOKEN_PYPI

This workflow builds on the earlier one by checking style and formatting before tests. If any of those checks fail, the process stops and surfaces the problems in the job logs. Caching based on the lock file reduces the time spent installing dependencies by reusing packages where nothing has changed. The matrix section ensures that the library remains compatible with the declared range of Python versions, which is especially helpful just before a release. It is possible to extend this further with coverage reports using pytest-cov and Codecov, static type checking with mypy, or pre-commit hooks to keep local development consistent with continuous integration. Publishing to TestPyPI in a separate job can help validate packaging without affecting the real index, and once outcomes look good, the main publishing step proceeds when a tag is pushed.

Conclusion

The result of adopting Poetry is a project that states its requirements clearly, installs them reliably and produces distributions without ceremony. For new work, it removes much of the friction that once accompanied Python packaging. For existing projects, the migration path is gentle and reversible, and the gains in determinism often show up quickly in fewer environment-related issues. When paired with a small amount of automation in a continuous integration system, the routine of building, testing and publishing becomes repeatable and visible to everyone on the team. That holds whether the package is destined for internal use on a private index or a public release on PyPI.

PandasGUI: A simple solution for Pandas DataFrame inspection from within VSCode

2^nd September 2025

One of the things that I miss about Spyder when running Python scripts is the ability to look at DataFrames easily. Recently, I was checking a VAT return only for tmux to truncate how much of the DataFrame I could see in output from the print function. While closing tmux might have been an idea, I sought the DataFrame windowing alternative. That led me to the pandasgui package, which did exactly what I needed, apart from pausing the script execution to show me the data. The installed was done using pip:

pip install pandasgui

Once that competed, I could use the following code construct to accomplish what I wanted:

import pandasgui

pandasgui.show(df)

In my case, there were several lines between the two lines above. Nevertheless, the first line made the pandasgui package available to the script, while the second one displayed the DataFrame in a GUI with scrollbars and cells, among other things. That was close enough to what I wanted to leave me able to complete the task that was needed of me.

What to do when the externally-managed environment error appears while using pip to install Python packages on Linux Mint 22

16^th December 2024

After upgrading to Linux Mint 22, the following message appeared when attempting to install Python packages using the pip command:

error: externally-managed-environment

× This environment is externally managed ╰─> To install Python packages system-wide, try apt install python3-xyz, where xyz is the package you are trying to install.

If you wish to install a non-Debian-packaged Python package, create a virtual environment using python3 -m venv path/to/venv. Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make sure you have python3-full installed.

If you wish to install a non-Debian packaged Python application, it may be easiest to use pipx install xyz, which will manage a virtual environment for you. Make sure you have pipx installed.

See /usr/share/doc/python3.12/README.venv for more information.

note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages. hint: See PEP 668 for the detailed specification.

This will frustrate anyone following how-tos on the web, so users will need to know about it. On something like Linux Mint, the repositories may not be as up-to-date as PyPI, so picking up the very latest version has its advantages. Thus, I initially used the unrecommended --break-system-packages switch to get things going as before, since doing never broke anything before. While the way of working feels like an overkill in some ways, using pipx probably is the way forward as long as things work as I want them to do.

There is wisdom in using virtual environments too, especially when AI models are involved. For most of what I get to do, that may be getting too elaborate. Then, deleting or renaming the message file in /usr/lib/python3.12/EXTERNALLY-MANAGED is tempting if that gets around things, as retrograde as that probably is. After all, I never broke anything before this message started to appear, possibly since my interests are data related.

Broadening data science horizons: Useful Python packages for working with data

14^th October 2021

My response to changes in the technology stack used in clinical research is to develop some familiarity with programming and scripting platforms that complement and compete with SAS, a system with which I have been programming since 2000. While one of these has been R, Python is another that has taken up my attention, and I now also have Julia in my sights as well. There may be others to assess in the fullness of time.

While I first started to explore the Data Science world in the autumn of 2017, it was in the autumn of 2019 that I began to complete LinkedIn training courses on the subject. Good though they were, I find that I need to actually use a tool to better understand it. At that time, I did get to hear about Python packages like Pandas, NumPy, SciPy, Scikit-learn, Matplotlib, Seaborn and Beautiful Soup though it took until of spring of this year for me to start gaining some hands-on experience with using any of these.

During the summer of 2020, I attended a BCS webinar on the CodeGrades initiative, a programming mentoring scheme inspired by the way classical musicianship is assessed. In fact, one of the main progenitors is a trained classical musician and teacher of classical music who turned to Python programming when starting a family to have a more stable income. The approach is that a student selects a project and works their way through it, with mentoring and periodic assessments carried out in a gentle and discursive manner. Of course, the project has to be engaging for the learning experience to stay the course, and that point came through in the webinar.

That is one lesson that resonates with me with subjects as diverse as web server performance and the ongoing pandemic supplying data, and there are other sources of public data to examine as well before looking through my own personal archive gathered over the decades. Though some subjects are uplifting while others are more foreboding, the key thing is that they sustain interest and offer opportunities for new learning. Without being able to dream up new things to try, my knowledge of R and Python would not be as extensive as it is, and I hope that it will help with learning Julia too.

In the main, my own learning has been a solo effort with consultation of documentation along with web searches that have brought me to the likes of Real Python, Stack Abuse, Data Viz with Python and R and others for longer tutorials as well as threads on Stack Overflow. Usually, the web searching begins when I need a steer on a particular or a way to resolve a particular error or warning message, but books are always worth reading even if that is the slower route. While those from the Dummies series or from O'Reilly have proved must useful so far, I do need to read them more completely than I already have; it is all too tempting to go with the try the "programming and search for solutions as you go" approach instead.

To get going, many choose the Anaconda distribution to get Jupyter notebook functionality, but I prefer a more traditional editor, so Spyder has been my tool of choice for Python programming and there are others like PyCharm as well. Because Spyder itself is written in Python, it can be installed using pip from PyPi like other Python packages. It has other dependencies like Pylint for code management activities, but these get installed behind the scenes.

The packages that I first met in 2019 may be the mainstays for doing data science, but I have discovered others since then. It also seems that there is porosity between the worlds of R and Python, so you get some Python packages aping R packages and R has the Reticulate package for executing Python code. There are Python counterparts to such Tidyverse stables as dplyr and ggplot2 in the form of Siuba and Plotnine, respectively. Though the syntax of these packages are not direct copies of what is executed in R, they are close enough for there to be enough familiarity for added user-friendliness compared to Pandas or Matplotlib. The interoperability does not stop there, for there is SQLAlchemy for connecting to MySQL and other databases (PyMySQL is needed as well) and there also is SASPy for interacting with SAS Viya.

While Python may not have the speed of Julia, there are plenty of packages for working with larger workloads. Of these, Dask, Modin and RAPIDS all have their uses for dealing with data volumes that make Pandas code crawl. As if to prove that there are plenty of libraries for various forms of data analytics, data science, artificial intelligence and machine learning, there also are the likes of Keras, TensorFlow and NetworkX. These are just a selection of what is available, and there is always the possibility of checking out others. It may be tempting to stick with the most popular packages all the time, especially when they do so much, but it never hurts to keep an open mind either.