22:14, 21st April 2025
The sassy system is a collection of R packages aimed at enhancing productivity, particularly for users familiar with SAS® software. Addressing limitations in areas such as logging, value formatting, data management, and report creation, these packages introduce concepts and workflows inspired by SAS to streamline R programming. The system includes key packages for logging, libraries, data formatting, reporting, and general utilities, along with specialised tools for tasks like generating define.xml files, accessing data, and creating analysis datasets.
22:11, 21st April 2025
Claude Code is an agentic coding assistant by Anthropic that integrates directly into the terminal, helping developers understand, edit and manage their codebases via natural language commands. After installing Node.js 18+ and Claude Code, users authenticate with their Anthropic Console account to access features such as file editing, bug fixing, codebase querying, Git operations, test execution and deeper project analysis, all within a secure environment. Designed for macOS, Ubuntu, Debian, and Windows via WSL, Claude Code employs a tiered permissions system, comprehensive memory management for persisting preferences, and built-in tools for code search and modification, while providing robust safeguards against command injection and other security risks. Customisable at both global and project levels, it supports automation in CI pipelines, accommodates various shell environments, offers Vim mode, cost tracking and model selection, and is compatible with Amazon Bedrock, Google Vertex AI, proxies and devcontainers. Privacy is emphasised, with limited retention of feedback data used solely to improve functionality and not for model training. Extensive configuration, troubleshooting and notification options ensure adaptability and usability for individual developers and teams.
22:56, 30th March 2025
While ActiveState has moved on from its Komodo and OpenKomodo editors, you still find the cod on GitHub. With all the other options out there, it is difficult to see why anyone would use these tools, save for an escape from the incessant onset of AI into everything we do.
19:31, 28th March 2025
It’s astonishing what a misplaced wp-config.php file can cause. Today, one ended up travelling in here while I was setting up a test blog for seeing how WordPress 2.7 was coming along. The result was that content more appropriate to my hillwalking blog turned up on web browsers and in feeds instead of what should have made its appearance. I’ll have to be more careful in future…
P.S. I realise that I have been quiet over the last few weeks, but that’s down to my being away in Scotland hiking on some of its islands and catching up with some friends in Edinburgh. I have some ideas for new posts, so they should manifest themselves sooner rather than later.
15:03, 27th March 2025
In delivering an overview of the current state of statistical programming recruitment, it's evident that the field is experiencing rapid transformation driven by increased complexity in clinical trials, heightened regulatory expectations, and technological advancements. The demand for programmers skilled in multiple languages beyond SAS, knowledgeable about automation and AI integration, and experienced in handling real-world evidence is rising. To remain competitive, companies must adapt their recruitment strategies to attract candidates with both technical proficiency and the capacity for cross-functional collaboration. This means prioritising those who can manage complex data, ensure regulatory compliance, and collaborate effectively across teams. Moreover, employing a balanced approach between permanent hires and contract specialists will provide the flexibility needed to meet high-demand projects whilst maintaining core expertise for ongoing operations. Companies that align their hiring processes with these evolving needs will position themselves to navigate regulatory shifts and technological advancements successfully, ensuring data integrity and timely project execution in a competitive landscape. Engaging specialist recruitment services can be instrumental in identifying candidates who meet these exacting standards, thus helping organisations build robust and future-ready statistical programming teams.
21:54, 19th March 2025
Pygments is a versatile, open-source syntax highlighting library, supporting 598 programming languages and other formats. It can be used as both a command-line tool and a library, and produces output in several formats including HTML, RTF, LaTeX and ANSI sequences. The project places particular emphasis on highlighting quality and makes it relatively straightforward to add support for new languages, with most relying on a regex-based lexical tokenisation mechanism. Available via the Python Package Index, it is maintained by Georg Brandl, Matthäus Chajdas and Jean Abou-Samra, with contributions from various other developers over the years.
20:30, 17th March 2025
When securing database credentials in R, several methods are available in order of preference. Integrated security with a DSN is optimal as it requires no plaintext credentials in code. Without a DSN, you can still use integrated security by passing connection settings as arguments. For storing credentials, the keyring package utilizes your operating system's credential storage (Keychain on macOS, Credential Store on Windows, Secret Service API on Linux) to securely encrypt and retrieve credentials. The config package allows credentials to be stored in a separate config.yml file. Environment variables can be set in the .Renviron file and retrieved with Sys.getenv(). Base R's options() function can store credentials temporarily during a session. As a last resort, the RStudio IDE can prompt users for credentials with masked input. Each method has different security implications and implementation complexity, but all aim to keep sensitive information out of plaintext code.
19:52, 17th March 2025
Unicode character encodings
When working with files in Python, data are initially read as binary bytes, which are then decoded into strings using a specified character encoding. Writing strings to files involves encoding them into bytes. The default encoding varies by operating system, with UTF-8 commonly used on Unix-based systems and CP1252 on Windows. Specifying the encoding explicitly when opening files is recommended to avoid issues like UnicodeDecodeError or mojibake, which occur when mismatched encodings lead to incorrect character interpretation. Proper encoding management ensures accurate data handling, especially when dealing with non-ASCII characters, and helps prevent errors that may arise from differences in default encodings across platforms.
14:56, 24th February 2025
Gerrit Code Review Gerrit Code Review has its roots in Google's internal Mondrian tool, a proprietary peer-review system built on Perforce that became highly valued among Google engineers. Guido van Rossum later open-sourced elements of Mondrian as Rietveld, a similar but advisory tool designed for use with Subversion and hosted on Google App Engine.
When the Android Open Source Project adopted Git as its primary version control system, engineers familiar with Mondrian sought equivalent functionality for their new environment, leading to Gerrit beginning life as a set of patches to Rietveld before diverging significantly enough to warrant its own identity, taking its name from Dutch architect Gerrit Rietveld. The project underwent a substantial rewrite in version 2.x, shifting from Python on App Engine to Java running on a J2EE servlet container with a SQL database, and was later revised again in version 3.x, which replaced the SQL database with NoteDb to store all metadata in Git and migrated the user interface from GWT to Polymer.
02:04, 16th February 2025
Apply Functions in R with Examples [apply(), sapply(), lapply (), tapply()]
The apply family of functions in R provides a more efficient alternative to traditional loops for performing operations on data structures such as lists, matrices, arrays and data frames. These functions, including apply(), lapply(), sapply() and tapply(), allow users to apply custom or built-in functions across elements of a dataset, often reducing code complexity and improving performance, particularly with large datasets.
The apply() function operates on matrices or arrays, applying operations row-wise, column-wise, or across both, while lapply() consistently returns a list by applying a function to each element of a vector, list, or data frame, while sapply() simplifies the output of lapply() to the most straightforward data structure, such as a vector or matrix, depending on the input, and tapply() is used to compute summary statistics across groups defined by factors, making it useful for categorical data analysis. Examples demonstrate how these functions can calculate aggregates like means, sums, or transformations, with outputs varying in structure based on the function and input type, highlighting their versatility in data manipulation and analysis.