22:32, 13th September 2024
Harness AIDA
Harness AI automates repetitive tasks in software delivery by integrating intelligence across DevOps, testing, security and cost management, using contextual learning to optimise workflows, predict failures and enforce governance while reducing manual intervention through adaptive automation and real-time recommendations.
20:33, 26th August 2024
How to loop blocks of code in Ansible
Ansible does not natively support looping over blocks of tasks using the block directive, which causes issues when multiple related tasks need to share the same loop variable. A practical workaround is to move the grouped tasks into a separate file and use the include\_tasks directive in the main playbook to reference it, applying the loop at that level instead.
Each task within the included file can then reference the loop variable using the standard {{ item }} syntax, allowing the entire group to iterate correctly over a list of values, such as IP addresses. While this approach requires splitting logic across multiple files, it remains the only reliable method for achieving this behaviour in Ansible.
20:47, 17th April 2024
The Many Ways to Exit in Python
Python offers several methods for terminating a programme, each with varying levels of suitability depending on the context. The most straightforward approach is raising a SystemExit exception, which triggers Python's standard exit routine, including running any registered atexit functions and accepting an optional integer exit code to signal success or failure.
A closely related option is sys.exit(), which ultimately performs the same operation but requires an import and obscures the fact that an exception is being raised. The built-in functions exit() and quit() are also available without importing anything, though they depend on the optional site module and are best reserved for interactive use in the Python REPL rather than in production code.
At the REPL, pressing Ctrl-D is a universally applicable shortcut that sends an end-of-file signal and works across many command-line programmes beyond Python alone. Finally, os._exit() exists as a lower-level option that bypasses Python's normal exit process entirely, making it highly disruptive and generally unsuitable for everyday use, though it can prove useful in specific debugging scenarios such as stopping a multithreaded programme where raising SystemExit would only halt the current thread.
15:20, 14th March 2024
Python Enum: How To Build Enumerations in Python
Although Python does not have a built-in enumeration data type, its standard library includes the enum module, which allows developers to create named constants grouped under a class. Enumerations are particularly useful when a variable can only take one of a fixed set of related values, such as days of the week, order statuses or task states, as they improve code clarity, maintainability and type safety by replacing arbitrary numbers or strings with meaningful labels.
Using the Enum class, developers can define members with custom names and values or rely on the auto() helper function to assign values automatically, starting from one or a specified number. Enum members can be accessed by name or value, counted using len() and iterated over just like any standard Python iterable. A practical application of this is a task management system in which a TaskStatus enum defines states such as TODO, IN_PROGRESS, DONE and ABANDONED, with a corresponding Task class that enforces valid state transitions and raises a ValueError when an invalid change is attempted.
16:59, 26th February 2024
Python args and kwargs: Demystified
Understanding how to use *args and **kwargs in Python functions allows for greater flexibility in handling variable numbers of arguments. These constructs enable functions to accept an arbitrary number of positional or keyword arguments, which can then be processed within the function body.
The * operator is used to unpack iterables such as lists or strings, allowing their elements to be passed individually to a function. Similarly, the ** operator unpacks dictionaries, merging their key-value pairs into another dictionary.
While these features offer powerful capabilities, readability should remain a priority, as overly complex unpacking can obscure the intent of the code. Mastery of these tools enhances the ability to write adaptable and efficient functions, making them essential for developers working with dynamic input scenarios.
16:29, 2nd February 2024
Stress testing of SAS Life Science Analytics Framework versions 5.4 and 5.4.1 identified specific limits for log and listing files when using supported browsers such as Google Chrome, Microsoft Edge and Mozilla Firefox. These limits include maximum line counts for logs and listings, as well as thresholds for errors, warnings and notes, with variations between Edge and Chrome.
Exceeding these thresholds may lead to unpredictable performance issues, such as delayed responses, memory errors, or session termination. Factors like system resources, browser settings and version differences can influence how these limits are experienced.
Recommendations include following best practices to reduce log and listing output, such as using ODS EXCLUDE or suppressing unnecessary information, to mitigate potential disruptions. Differences in architecture between 5.4.x and earlier versions may also affect the likelihood of encountering these issues.
22:26, 31st January 2024
SAS Arrays and DO Loop Made Easy
SAS arrays and DO loops are two fundamental programming tools used in SAS data steps to process multiple variables efficiently. An array groups a set of variables under a single name, allowing operations to be performed across all of them without writing repetitive code, and can handle both numeric and character variables.
The syntax requires specifying an array name, the number of elements and a list of variables, though SAS can automatically calculate the element count using an asterisk. The DO loop complements arrays by enabling iterative processing across a defined range of values, using an index variable to track each iteration.
Together, these tools support a wide range of practical tasks, such as replacing values that exceed a threshold with missing data, extracting substrings from character variables, assigning results to new variables, multiplying values by predefined factors stored in temporary arrays and calculating percentage differences between consecutive variables.
The DIM function is commonly used to determine the number of elements in an array dynamically, while the OF operator allows functions such as sum, mean and min to be applied across all array elements. An alternative to the standard indexed DO loop is the DO OVER loop, which iterates directly over array elements without requiring an explicit index variable.
23:19, 29th January 2024
How to End a Program in Python
Ending a Python programme can be achieved through several methods, each with distinct applications and considerations. The sys.exit() function is commonly used in production code as it raises the SystemExit exception, allowing for controlled termination with optional messages. Quit() and exit() functions, while built-in, rely on the site module and are typically avoided in professional settings due to their reliance on external components.
Raising SystemExit directly offers flexibility, particularly when combined with conditional checks. The os._exit() function provides a non-standard approach for immediate termination with a specified status, though it bypasses normal clean-up processes. Handling uncaught exceptions ensures programmes terminate gracefully, preventing unexpected crashes.
Additionally, KeyboardInterrupt exceptions allow users to interrupt execution manually, a feature useful during debugging or long-running tasks. Each method serves specific scenarios, requiring careful selection based on the context of the programme's requirements and environment.
20:01, 22nd January 2024
Flight, DataFusion, Arrow, and Parquet: Using the FDAP Architecture to build InfluxDB 3.0
The FDAP stack, comprising Apache Arrow, Apache Arrow Flight, Apache Arrow DataFusion and Apache Parquet, is the architectural foundation of InfluxDB 3.0, a modern time series database built by InfluxData. Before this stack's emergence, developers building data-centric systems were forced to repeatedly re-implement complex low-level components, consuming resources that could otherwise go toward domain-specific features.
Arrow provides an efficient, standardised in-memory data representation, eliminating the need to define custom memory layouts or data types. Flight handles fast, interoperable network data transfer using a gRPC-based protocol, removing the burden of building and maintaining custom client drivers for multiple programming languages. Parquet offers high-performance columnar storage with compression that, in testing, proved around five times better than specialised time series formats, while also enabling direct interoperability with tools such as Pandas, DuckDB and Snowflake. DataFusion provides a modular, vectorised query execution engine written in Rust, delivering SQL and InfluxQL support without the many years of engineering effort such a system would traditionally require. Together, these components allow development teams to focus investment on the features that differentiate their product rather than rebuilding foundational infrastructure, with the added advantages of open standards governance under the Apache Software Foundation and a broad, shared global contributor base.
11:58, 6th December 2023
Using the RStudio Terminal in the RStudio IDE
The RStudio IDE includes a built-in terminal feature that provides direct access to the system shell without leaving the development environment, supporting full-screen applications such as vim, Emacs and tmux as well as standard command-line operations. Users can open multiple independent terminal sessions, rename them for easier navigation, and send code directly from the editor to the active terminal using keyboard shortcuts.
The terminal buffer retains the last 1000 lines of output and can be transferred to the editor if needed. Although terminal sessions are tied to the lifecycle of the R session, RStudio mitigates potential data loss by saving and restoring open session names, buffers, environment variables and working directories where supported.
Platform differences are notable, with Windows lacking certain features such as busy detection, environment variable capture and buffer restoration for Command Prompt or PowerShell sessions. Advanced users can configure terminal multiplexers such as tmux or screen to keep sessions alive across R session restarts, and those on RStudio Workbench can work around version-switching limitations by modifying their shell configuration file to ensure the correct version of R is prioritised on the path.