Coding Notebook

23:19, 29^th January 2024

Ending a Python programme can be achieved through several methods, each with distinct applications and considerations. The sys.exit() function is commonly used in production code as it raises the SystemExit exception, allowing for controlled termination with optional messages. Quit() and exit() functions, while built-in, rely on the site module and are typically avoided in professional settings due to their reliance on external components.

Raising SystemExit directly offers flexibility, particularly when combined with conditional checks. The os._exit() function provides a non-standard approach for immediate termination with a specified status, though it bypasses normal clean-up processes. Handling uncaught exceptions ensures programmes terminate gracefully, preventing unexpected crashes.

Additionally, KeyboardInterrupt exceptions allow users to interrupt execution manually, a feature useful during debugging or long-running tasks. Each method serves specific scenarios, requiring careful selection based on the context of the programme's requirements and environment.

20:01, 22^nd January 2024

Flight, DataFusion, Arrow, and Parquet: Using the FDAP Architecture to build InfluxDB 3.0

The FDAP stack, comprising Apache Arrow, Apache Arrow Flight, Apache Arrow DataFusion and Apache Parquet, is the architectural foundation of InfluxDB 3.0, a modern time series database built by InfluxData. Before this stack's emergence, developers building data-centric systems were forced to repeatedly re-implement complex low-level components, consuming resources that could otherwise go toward domain-specific features.

Arrow provides an efficient, standardised in-memory data representation, eliminating the need to define custom memory layouts or data types. Flight handles fast, interoperable network data transfer using a gRPC-based protocol, removing the burden of building and maintaining custom client drivers for multiple programming languages. Parquet offers high-performance columnar storage with compression that, in testing, proved around five times better than specialised time series formats, while also enabling direct interoperability with tools such as Pandas, DuckDB and Snowflake. DataFusion provides a modular, vectorised query execution engine written in Rust, delivering SQL and InfluxQL support without the many years of engineering effort such a system would traditionally require. Together, these components allow development teams to focus investment on the features that differentiate their product rather than rebuilding foundational infrastructure, with the added advantages of open standards governance under the Apache Software Foundation and a broad, shared global contributor base.

11:58, 6^th December 2023

Using the RStudio Terminal in the RStudio IDE

The RStudio IDE includes a built-in terminal feature that provides direct access to the system shell without leaving the development environment, supporting full-screen applications such as vim, Emacs and tmux as well as standard command-line operations. Users can open multiple independent terminal sessions, rename them for easier navigation, and send code directly from the editor to the active terminal using keyboard shortcuts.

The terminal buffer retains the last 1000 lines of output and can be transferred to the editor if needed. Although terminal sessions are tied to the lifecycle of the R session, RStudio mitigates potential data loss by saving and restoring open session names, buffers, environment variables and working directories where supported.

Platform differences are notable, with Windows lacking certain features such as busy detection, environment variable capture and buffer restoration for Command Prompt or PowerShell sessions. Advanced users can configure terminal multiplexers such as tmux or screen to keep sessions alive across R session restarts, and those on RStudio Workbench can work around version-switching limitations by modifying their shell configuration file to ensure the correct version of R is prioritised on the path.

16:47, 3^rd November 2023

A SAS RTF Parser Macro with the Aid of an R Package Extracting data from Rich Text Format (RTF) tables is a common requirement in clinical trial reporting, particularly for validating tables and comparing different versions. A method involving the R package textreadr simplifies this process by parsing RTF files into data frames, which are then converted to SPSS format for import into SAS.

This approach includes steps such as reading the RTF file, adjusting column delimiters, converting factor variables to character format and saving the output. A SAS macro, %RTFParser, integrates this R-based parsing into SAS workflows by calling the textreadr function through a custom interface, allowing the parsed data to be split into individual variables for further analysis. The solution relies on the flexibility of the textreadr package and demonstrates how combining R and SAS can streamline data extraction from complex RTF structures.

16:33, 31^st July 2023

Securing your CDN: Why and how should you use SRI

Relying on third-party resources through a Content Delivery Network (CDN) can expose websites to security risks if those external sources are compromised, potentially allowing malicious code to affect users. Sub-Resource Integrity (SRI) addresses this by enabling developers to verify that external scripts and stylesheets remain unchanged when loaded.

This is achieved by adding an integrity attribute to HTML elements, which contains a cryptographic hash of the expected resource. If the remote file differs even slightly, the browser will block its execution, preventing unauthorised modifications.

Implementing SRI involves specifying the hash, typically obtained using tools like an SRI hash generator, and including the crossorigin attribute to ensure anonymous requests. While SRI enhances security, it requires explicitly defining resource versions, which limits automatic updates and may cause functionality issues if a critical file becomes unavailable. Despite these limitations, SRI provides a robust defence against compromised third-party content, ensuring that only verified versions of external resources are used, thereby protecting both websites and their users.

19:53, 4^th July 2023

LangChain is an open-source framework designed to simplify the development of agents and applications powered by large language models, allowing developers to connect to providers such as OpenAI, Anthropic and Google in fewer than ten lines of code. It offers a standardised interface for interacting with different models, helping to avoid provider lock-in, and its agent architecture is built on top of LangGraph, a lower-level orchestration framework that supports durable execution, human-in-the-loop workflows and persistence.

For those with more straightforward needs, LangChain handles the complexity of LangGraph under the hood, meaning developers do not need direct knowledge of it for basic use. A higher-level option called Deep Agents is also available, offering additional capabilities such as automatic compression of long conversations, a virtual file system and subagent-spawning, while LangGraph itself remains the recommended choice for advanced use cases requiring a combination of deterministic and agentic workflows with heavy customisation. Debugging and tracing of agent behaviour can be carried out using LangSmith, which provides visualisation tools, execution path tracing and runtime metrics.

17:01, 14^th June 2023

I'm an R user: Quarto or R Markdown?

Posit introduced Quarto as an open-source publishing system that integrates narrative text and code to generate reports, presentations and websites, offering a command-line interface that simplifies rendering documents outside the RStudio IDE. Unlike R Markdown, which relies on multiple packages for different outputs, Quarto consolidates functionality into a single system, reducing dependencies and streamlining workflows.

It supports language-agnostic code execution, enabling collaboration across teams using different programming languages, while features like Hugo-style includes and shortcodes allow reusable content and dynamic variable insertion. Quarto projects provide shared YAML metadata and variables across documents, enhancing consistency and ease of maintenance.

Global code options, such as document-wide settings and freeze functionality, improve efficiency in managing large-scale outputs. Although R Markdown remains supported, Quarto offers advantages for users seeking unified tools and advanced formatting capabilities, particularly in multilingual or collaborative environments.

09:33, 12^th May 2023

NodeSource Node.js Binary Distributions

NodeSource has provided DEB and RHEL Node.js binary distributions for over a decade, accumulating more than 100 million downloads annually and supporting millions of developers worldwide. The company offers both supported and legacy versions of Node.js, with extended technical assistance available for versions no longer maintained by the open-source project. Alongside its distributions, NodeSource provides a range of services including support, consulting and training, as well as solutions tailored to areas such as API integration, high-performance applications, legacy migration and the Internet of Things.

14:01, 11^th May 2023

Julia 1.9 Highlights Julia 1.9 introduces several enhancements aimed at improving performance, usability and compatibility. The package manager, Pkg, now allows more precise control over updates, with commands like up Foo restricting upgrades to the specified package and add prioritising already installed versions to reduce pre-compilation overhead.

Support for Apple Silicon has been elevated to Tier 1, ensuring robust testing and integration. The update to LLVM v14 enables autovectorisation for SVE/SVE2 extensions on AArch64 CPUs, enhancing performance on hardware like Fujitsu’s A64FX and Apple’s M series. Native Float16 arithmetic is now supported on compatible AArch64 processors, offering significant speed improvements in memory-bound applications.

Additional features include faster test coverage tracking by default, better monorepo support through shared manifest files and refined handling of sub-packages within larger projects. These changes collectively aim to streamline development workflows, optimise computational efficiency and expand hardware compatibility.

09:10, 26^th April 2023

Shiny User Adoption Fails: 9 Reasons Why Nobody Uses Your App

Low adoption of Shiny apps often stems from multiple interconnected factors, including insufficient awareness among users, unclear communication of the app's purpose, misalignment with actual user needs, usability challenges and resistance to change. Users may not discover the app, fail to understand its benefits, or find it incompatible with their workflows, such as requiring internet access when offline or being poorly optimised for mobile use. Even when users are aware of the app, poor design, lack of intuitive navigation, or trust issues arising from bugs or incorrect data can deter engagement. Addressing these issues requires proactive user research, iterative testing and collaboration with diverse stakeholders to ensure the app meets real needs and is both accessible and easy to use. Additionally, performance bottlenecks, such as slow loading times, or competing with existing tools like Excel may further reduce adoption, necessitating refinements to simplify interfaces and enhance functionality. Overcoming these barriers involves a combination of technical improvements, user education and fostering confidence through reliable performance and transparent development processes.

« Older Entries «

» Newer Entries »