16:47, 3rd November 2023
A SAS RTF Parser Macro with the Aid of an R Package Extracting data from Rich Text Format (RTF) tables is a common requirement in clinical trial reporting, particularly for validating tables and comparing different versions. A method involving the R package textreadr simplifies this process by parsing RTF files into data frames, which are then converted to SPSS format for import into SAS.
This approach includes steps such as reading the RTF file, adjusting column delimiters, converting factor variables to character format and saving the output. A SAS macro, %RTFParser, integrates this R-based parsing into SAS workflows by calling the textreadr function through a custom interface, allowing the parsed data to be split into individual variables for further analysis. The solution relies on the flexibility of the textreadr package and demonstrates how combining R and SAS can streamline data extraction from complex RTF structures.
16:33, 31st July 2023
Securing your CDN: Why and how should you use SRI
Relying on third-party resources through a Content Delivery Network (CDN) can expose websites to security risks if those external sources are compromised, potentially allowing malicious code to affect users. Sub-Resource Integrity (SRI) addresses this by enabling developers to verify that external scripts and stylesheets remain unchanged when loaded.
This is achieved by adding an integrity attribute to HTML elements, which contains a cryptographic hash of the expected resource. If the remote file differs even slightly, the browser will block its execution, preventing unauthorised modifications.
Implementing SRI involves specifying the hash, typically obtained using tools like an SRI hash generator, and including the crossorigin attribute to ensure anonymous requests. While SRI enhances security, it requires explicitly defining resource versions, which limits automatic updates and may cause functionality issues if a critical file becomes unavailable. Despite these limitations, SRI provides a robust defence against compromised third-party content, ensuring that only verified versions of external resources are used, thereby protecting both websites and their users.
19:53, 4th July 2023
LangChain is an open-source framework designed to simplify the development of agents and applications powered by large language models, allowing developers to connect to providers such as OpenAI, Anthropic and Google in fewer than ten lines of code. It offers a standardised interface for interacting with different models, helping to avoid provider lock-in, and its agent architecture is built on top of LangGraph, a lower-level orchestration framework that supports durable execution, human-in-the-loop workflows and persistence.
For those with more straightforward needs, LangChain handles the complexity of LangGraph under the hood, meaning developers do not need direct knowledge of it for basic use. A higher-level option called Deep Agents is also available, offering additional capabilities such as automatic compression of long conversations, a virtual file system and subagent-spawning, while LangGraph itself remains the recommended choice for advanced use cases requiring a combination of deterministic and agentic workflows with heavy customisation. Debugging and tracing of agent behaviour can be carried out using LangSmith, which provides visualisation tools, execution path tracing and runtime metrics.
17:01, 14th June 2023
I'm an R user: Quarto or R Markdown?
Posit introduced Quarto as an open-source publishing system that integrates narrative text and code to generate reports, presentations and websites, offering a command-line interface that simplifies rendering documents outside the RStudio IDE. Unlike R Markdown, which relies on multiple packages for different outputs, Quarto consolidates functionality into a single system, reducing dependencies and streamlining workflows.
It supports language-agnostic code execution, enabling collaboration across teams using different programming languages, while features like Hugo-style includes and shortcodes allow reusable content and dynamic variable insertion. Quarto projects provide shared YAML metadata and variables across documents, enhancing consistency and ease of maintenance.
Global code options, such as document-wide settings and freeze functionality, improve efficiency in managing large-scale outputs. Although R Markdown remains supported, Quarto offers advantages for users seeking unified tools and advanced formatting capabilities, particularly in multilingual or collaborative environments.
09:33, 12th May 2023
NodeSource Node.js Binary Distributions
NodeSource has provided DEB and RHEL Node.js binary distributions for over a decade, accumulating more than 100 million downloads annually and supporting millions of developers worldwide. The company offers both supported and legacy versions of Node.js, with extended technical assistance available for versions no longer maintained by the open-source project. Alongside its distributions, NodeSource provides a range of services including support, consulting and training, as well as solutions tailored to areas such as API integration, high-performance applications, legacy migration and the Internet of Things.
14:01, 11th May 2023
Julia 1.9 Highlights
Julia 1.9 introduces several enhancements aimed at improving performance, usability and compatibility. The package manager, Pkg, now allows more precise control over updates, with commands like up Foo restricting upgrades to the specified package and add prioritising already installed versions to reduce pre-compilation overhead.
Support for Apple Silicon has been elevated to Tier 1, ensuring robust testing and integration. The update to LLVM v14 enables autovectorisation for SVE/SVE2 extensions on AArch64 CPUs, enhancing performance on hardware like Fujitsu’s A64FX and Apple’s M series. Native Float16 arithmetic is now supported on compatible AArch64 processors, offering significant speed improvements in memory-bound applications.
Additional features include faster test coverage tracking by default, better monorepo support through shared manifest files and refined handling of sub-packages within larger projects. These changes collectively aim to streamline development workflows, optimise computational efficiency and expand hardware compatibility.
09:10, 26th April 2023
Shiny User Adoption Fails: 9 Reasons Why Nobody Uses Your App
Low adoption of Shiny apps often stems from multiple interconnected factors, including insufficient awareness among users, unclear communication of the app's purpose, misalignment with actual user needs, usability challenges and resistance to change. Users may not discover the app, fail to understand its benefits, or find it incompatible with their workflows, such as requiring internet access when offline or being poorly optimised for mobile use. Even when users are aware of the app, poor design, lack of intuitive navigation, or trust issues arising from bugs or incorrect data can deter engagement. Addressing these issues requires proactive user research, iterative testing and collaboration with diverse stakeholders to ensure the app meets real needs and is both accessible and easy to use. Additionally, performance bottlenecks, such as slow loading times, or competing with existing tools like Excel may further reduce adoption, necessitating refinements to simplify interfaces and enhance functionality. Overcoming these barriers involves a combination of technical improvements, user education and fostering confidence through reliable performance and transparent development processes.
17:14, 25th April 2023
The find command on Linux is a versatile tool used to locate files and directories based on specific criteria such as name, type, size, permissions, modification date and ownership. It operates using a structured syntax that includes options, paths and expressions, allowing users to search recursively through directory hierarchies. Common applications include locating files by name with -name, filtering by file type with -type, searching for files within a size range using -size and identifying files with particular permissions via -perm.
The command can also perform actions on matched files, such as modifying permissions with -exec or deleting files with -delete, though caution is advised when using destructive operations. Examples demonstrate its use in tasks like finding all JavaScript files in a directory, locating files modified within a specific timeframe, or changing ownership of files owned by a particular user. Its flexibility makes it essential for system administrators and users requiring precise file management on Linux systems.
14:51, 23rd April 2023
Hugo: Shortcodes are small snippets placed within content files that are rendered using predefined templates, allowing authors to incorporate raw HTML or additional formatting into Markdown content. The shortcodes covered in this article span a wide range of functions, including managing HTML abbreviations and anchor links, displaying styled blockquotes and code examples with syntax highlighting, applying inline colour styling, embedding images in PNG, JPG, TIFF and SVG formats with resizable options, rendering keyboard key indicators and generating internal page links with optional anchor targeting. Additional shortcodes handle styled alert and note blocks in varying colours to convey danger, information, success, tips and warnings, as well as tag linking, connections to Hugo's official documentation, OpenBSD manual pages and Wikipedia articles. All shortcodes are stored in the layout/shortcodes/ directory, with referenced content files placed in a dedicated folder within the content directory, and the implementation described here was built using Hugo version 0.53 on OpenBSD 6.6.
11:06, 15th April 2023
PHP: apache_request_headers
The apache_request_headers() function, available since PHP 4.3.0, retrieves all HTTP request headers from the current request as an associative array, and works across Apache, FastCGI, CLI and FPM server environments, with FPM support added in version 7.3.0. The function takes no parameters, and because HTTP header field names are case-insensitive by standard, developers should loop through returned keys in a case-insensitive manner rather than relying on consistent capitalisation. In environments where the function is unavailable, a common workaround involves iterating over the $_SERVER superglobal and extracting keys prefixed with HTTP_, reformatting them to approximate the original header naming convention, or using Apache's mod_rewrite to pass specific headers into the PHP environment as server variables.