17:21, 15th February 2025
After checking in with R again as part of getting client work on the go once more, I went about setting up R and RStudio on a new machine. It was when I tried to add packages that things did not proceed so smoothly. It turned out that there were system dependencies that were missing. The combination of a console showing red against black and a lot of output made the problem difficult to spot. Handily, AI had a use here, and Google Gemini is turning out to be very useful when I have some debugging to do. All got sorted on this occasion; it might help to harvest a list of packages, so I have them for future reference.
22:04, 6th February 2025
The January 2025 release of Visual Studio Code, version 1.97, brings numerous updates and enhancements aimed at improving coding efficiency and security for developers. Notable features include GitHub Copilot's Next Edit Suggestions, which predicts coding edits, and enhancements in workspace management, such as a repositionable Command Palette and enhanced log filtering capabilities. The update introduces significant security features like extension publisher trust and compound log views for better log analysis. Developers can now debug Python scripts without setup and benefit from advanced git blame functionalities and support for various source control actions. Accessibility features have also been refined, enhancing sound clarity and adding keyboard shortcuts for easier navigation. The update further includes support for customisable terminal settings, enhanced debug capabilities, and diverse improvements in documentation and syntax highlighting. Remote development is also enhanced with better SSH configuration, and contributions from the community have helped streamline the codebase and improve the development workflow.
14:22, 17th December 2024
Steve's Data Tips and Tricks provides a comprehensive guide to using the na.omit() function in R to manage missing values effectively in vectors, matrices, and data frames. Missing values, often represented as "NA", can arise from various issues such as data collection errors and incomplete surveys, which can adversely affect statistical calculations, model accuracy, and data visualisation. The guide explains the basic usage of the na.omit() function, its syntax, and how it can be applied to vectors and data frames for removing incomplete cases. It offers practical examples, advanced applications like conditional removal, and best practices, such as backing up original data and considering the implications of data removal. The guide addresses FAQs, highlighting that while na.omit() is effective, alternative methods exist for handling missing values, and ultimately emphasises the importance of documenting strategies for managing NA values in data analysis.
11:55, 25th November 2024
How to run R in Visual Studio Code
Setting up R in Visual Studio Code involves installing specific extensions, configuring settings and adjusting terminal preferences to enable features like GitHub Copilot integration, which offers enhanced AI-assisted coding compared to RStudio. While the process requires additional steps such as installing Python-based tools like radian and R packages, it provides access to features such as interactive data previews, colour pickers and customisable code snippets, making it a viable alternative for users seeking advanced AI capabilities or working with multiple programming languages. The experience, though more complex than RStudio’s streamlined setup, offers flexibility and tools that may appeal to developers prioritising Copilot’s functionality or hybrid workflows.
16:14, 18th November 2024
Although SAS does not produce a version of its software that runs natively on Apple Mac hardware, there are several supported methods that allow Mac users to access its features. Free cloud-based options include SAS OnDemand for Academics, SAS Viya for Learners and SAS Viya Workbench for Learners, all of which can be accessed through a supported browser such as Chrome or Firefox. Visual Studio Code, which runs well on a Mac, supports a SAS extension that enables remote connections to both SAS 9.4 and SAS Viya sessions.
Users can also run SAS for Windows on a Mac through virtualisation software such as Parallels or VMware, though this is not an officially tested configuration and SAS Technical Support will not assist with Mac-specific setup issues. It is worth noting that SAS 9.4 and its associated client applications are incompatible with ARM-based Mac chips, as they are built for Intel x64 architecture. SAS Analytics Pro, which is container-based and deployed using Docker, can be run on macOS and accessed via a local browser. Finally, those who require interactive statistical software that installs and runs natively on a Mac may wish to consider JMP, a separate product from SAS that is available for both Windows and Mac operating systems.
10:58, 8th November 2024
10 Python One-Liners That Will Boost Your Data Science Workflow
Python offers a wide range of tools and techniques that can streamline data science workflows, many of which can be written in a single line of code. Pandas' fillna method can be combined with conditional logic to automatically fill numerical missing values with their median and categorical ones with their mode, while highly correlated features can be removed using a one-line correlation filter. New columns with multiple conditions can be generated efficiently using the apply method with lambda functions, and Python's built-in Set data type allows for quick identification of common or differing elements across datasets.
NumPy boolean masks provide a versatile way to filter arrays, and the Counter function from the collections module offers a rapid means of calculating value frequencies within a list. Regular expressions paired with map can extract numerical values from strings, nested lists can be flattened using the sum function and two lists can be merged into a dictionary using zip and dict together. Finally, multiple dictionaries can be consolidated into one using dictionary unpacking, making it straightforward to aggregate structured data for further preprocessing and analysis.
20:33, 2nd October 2024
How to Create Interactive Visualisations in R
Creating interactive visualisations in R enhances data exploration by allowing users to manipulate and analyse information dynamically. Packages such as DT, Plotly and Leaflet enable the development of interactive tables, charts and maps, offering features like sorting, filtering, zooming and hovering for detailed insights. For instance, DT facilitates sortable and searchable tables, Plotly transforms static plots into interactive scatter and bar charts and Leaflet generates maps with clickable markers displaying regional data.
These tools support tasks such as comparing life expectancy across continents, visualising population distributions, or examining GDP trends, thereby simplifying complex data analysis and improving the clarity of presentations. By leveraging these packages, users can create engaging visual outputs that aid in uncovering patterns and communicating findings effectively.
16:10, 26th September 2024
Regular Expressions 101 is an interactive online tool designed to help developers learn, test and debug regular expressions, the pattern-matching syntax widely used across programming languages and command line tools. It features a live editor in which a pattern can be entered alongside sample data, with matches highlighted in real time and a plain language explanation generated that breaks down each component of the expression, covering constructs such as character classes, quantifiers, groups and anchors.
A step-by-step debugger traces how the regex engine processes input character by character, which proves particularly useful when troubleshooting complex or unexpected behaviour. The tool supports multiple regex engines including PCRE, JavaScript and Python, allowing differences between implementations to be explored directly. Users can also save expressions, build a small library of tested patterns and generate shareable links.
Common use cases include parsing logs, validating input, extracting data and writing search expressions for tools such as grep or various text editors. By combining testing, explanation and debugging in a single environment, the tool has become a widely used reference for both learning regex syntax and resolving patterns that do not behave as expected.
14:24, 25th September 2024
TimeAPI is a service offering developers and businesses access to accurate, real-time global time and time zone data through a pair of dedicated APIs. The Time Zone API enables users to retrieve time zone information, convert times between locations and monitor daylight saving time changes, while the Time API provides current time data for any location worldwide. Documentation is presented through a Swagger interface to make integration straightforward, and the service supports multiple calendar formats including ISO 8601, Unix Time and Julian Day. The platform handles approximately 2.5 billion requests per month, maintains a response time of under one millisecond and boasts an uptime of 99.99%.
21:45, 22nd September 2024
How to delete all files in a directory with Python?
Deleting files in a directory using Python can be achieved through built-in modules that offer different approaches for managing file removal. The os module provides functions like listdir() and remove() to iterate over files and delete them individually, ensuring simplicity and readability. Introduced in Python 3.4, the pathlib module uses an object-oriented approach with methods such as iterdir() and unlink() to perform similar tasks. For scenarios involving subdirectories, the shutil module extends functionality by allowing the deletion of both files and folders recursively.
Each method requires careful validation of file paths and permissions to avoid unintended data loss, particularly when handling large-scale or automated operations. The choice of module depends on the specific requirements, such as whether to remove only files, include subdirectories, or maintain structured control over the deletion process.