Technology Tales

Notes drawn from experiences in consumer and enterprise technology

15:12, 28th November 2021

Julia in VS Code

Julia for Visual Studio Code is a free, open-source integrated development environment designed for the Julia programming language, combining ease of use with strong performance to suit both beginners and experienced developers. The extension is officially supported within VS Code and aims to reduce the complexity of programming by providing a fully live environment. Installation guidance, configuration instructions and release notes are available within the documentation, which is currently undergoing revision. Users who encounter problems are encouraged to consult the FAQ before raising issues on the relevant GitHub repository or Julia Discourse forum, where support is provided by the development community.

15:10, 28th November 2021

Is case_when needed in DataFrames.jl?

In Julia, the ternary operator can replicate the functionality of the case_when function from the R package dplyr, making a dedicated equivalent unnecessary in DataFrames.jl. Through a series of practical examples using basic value classification and the Star Wars dataset from dplyr, the ternary operator proves capable of handling conditional data transformation across single and multiple columns with comparable syntax.

One notable difference is how the two languages treat missing values, as Julia enforces strict boolean evaluation, requiring the use of functions like isequal or coalesce, whereas R handles missing comparisons by treating them as failing conditions by default. Julia also offers two additional advantages over case_when in that it does not require all output values to share the same type, and it only evaluates the expressions needed to determine a result, rather than all possible expressions, which can help avoid errors when certain values are incompatible with specific operations.

15:10, 28th November 2021

Julia language: a concise tutorial

Created by Antonello Lobianco and supported by the French National Research Agency, this concise tutorial on the Julia programming language serves two purposes: to document the author's own learning of Julia and to provide an accessible entry point for those looking to start coding in the language without having to read through the full official documentation. Pitched somewhere between a traditional tutorial and a quick reference guide, it covers the core elements of the language including data types, control flow, functions, custom structures and error handling, as well as domain-specific topics addressed through a section on useful packages.

The material is aimed at those who already have some familiarity with programming in other languages, and has evolved over time into both a published book through Apress and a free online MOOC course covering scientific programming and machine learning with Julia. The tutorial maintains compatibility notes across multiple Julia versions dating back to 0.5, and is kept updated to reflect changes in key packages and the language itself.

15:09, 28th November 2021

DataFrames.jl: why do we have both subset and filter functions?

The DataFrames.jl package in Julia includes two row-filtering functions, filter and subset, each suited to different use cases. The filter function follows the standard Julia Base contract, accepting a predicate that operates on individual rows or whole groups and returning a Boolean value, making it well suited for single-condition filtering or for removing entire groups from a GroupedDataFrame.

The subset function, introduced to align more closely with the broader DataFrames.jl ecosystem, always operates on whole columns, accepts multiple conditions simultaneously and returns a data frame rather than a GroupedDataFrame by default. A key distinction is that subset requires predicates to return a vector of Boolean values rather than a scalar, which is enforced using the ByRow wrapper, and it also supports a skipmissing keyword argument for handling missing values gracefully, something filter does not offer. Both functions have in-place variants, filter! and subset!, and the choice between them largely depends on the complexity of the filtering logic and the structure of the input data.

15:06, 28th November 2021

First-Class Statistical Missing Values Support in Julia 0.7

Julia 0.7 introduced first-class support for statistical missing values, a feature long common in specialised languages such as SQL, R and Stata but rarely found in general-purpose programming languages. The new implementation centres on a dedicated missing object, which is the sole instance of the Missing singleton type, allowing values to be represented as 'Union{Missing,T}' for any type 'T'.

This design closely follows the behaviour of SQL's 'NULL' and R's 'NA', ensuring that missing values propagate safely through standard operators and mathematical functions rather than being silently ignored or substituted, which is a known cause of errors in published scientific work. Arrays containing missing values use a compact memory layout consisting of a pair of arrays, one holding non-missing values and another storing type tags, which keeps memory overhead modest and allows the compiler to generate highly efficient native code.

The framework replaces the earlier Nullable type, which had proven unsuitable for statistical use due to type-stability issues, suboptimal memory layout and conflation of two distinct concepts of nullity. Alongside missing, the nothing object now serves the separate role of representing the absence of a value in a software engineering context.

Convenience functions such as skipmissing and coalesce are provided to handle missing values explicitly when required. While performance is already competitive with or faster than equivalent R operations in numerous instances, further compiler improvements are anticipated, particularly for floating-point arrays and more complex code patterns. The missing values framework is used by version 0.11 of the DataFrames package and represents one of the most complete implementations of statistical missing value support across both general-purpose and specialised languages.

15:05, 28th November 2021

Python: Check if a String Contains a Substring

In Python, there are several ways to check whether a string contains a substring. The simplest approach is the in operator, which returns a Boolean value and is recommended for most use cases, though it requires a null check to avoid exceptions when the string is set to None.

The index() method can also be used, returning the position of the first occurrence of the substring but requiring exception handling for cases where the substring is not found. A more convenient alternative is the find() method, which returns the index of the substring or -1 if no match is found, removing the need for exception handling.

For more complex scenarios, such as case-insensitive matching or searching large datasets, Python's built-in re module provides regular expression support through its search() function, though this approach is slower and more complex than the simpler methods and is therefore best reserved for situations where straightforward substring matching is insufficient.

15:04, 28th November 2021

How to COALESCE in Pandas

In Python's Pandas library, the combine_first() method can be used to replicate a COALESCE function, returning the first non-null value across two columns. When applied to a DataFrame, it will return the values of a primary column and fall back to the corresponding value of a secondary column wherever null values are encountered.

08:58, 26th November 2021

A fresh start for R in VSCode

Visual Studio Code (VSCode) can be configured as a fully functional development environment for R programming through the installation of a few key components. The process involves installing the languageserver package within the R console, followed by two VSCode extensions, namely the R Extension by Yuki Ueda and the R LSP Client by REditorSupport, which together provide features such as shortcuts, environment viewing, linting, autocomplete and intelligent function suggestions. A Python module called Radian serves as a modern alternative to the standard R console, offering improved colour schemes and output representations. Once these components are in place, a handful of configuration lines added to the editor's settings file are sufficient to complete the setup, resulting in a lightweight yet capable environment that rivals a dedicated R IDE.

08:57, 26th November 2021

R and radian on macOS and VSCode

Setting up R with VSCode on macOS requires several components to work together correctly. R itself must be installed from the official CRAN repository alongside XQuartz, while radian, a popular alternative R console, can be installed via its GitHub repository.

A common issue on macOS is a runtime error related to a missing R home directory, which can be resolved by locating the correct path within the R application and exporting it as an environment variable, then saving that setting permanently in a shell configuration file. The R extension for VSCode, developed by Yuki Ueda, enables running R code directly from the editor, though it requires radian to be correctly configured within the extension settings along with bracketed paste being enabled.

Additional configuration, such as setting the correct R executable path, allows the help topic viewer and function helper to work properly within the editor. For those wanting linting and autocompletion support, the R LSP Client extension by REditorSupport can be installed alongside the corresponding language server from CRAN.

08:57, 26th November 2021

Writing R in VSCode: Interacting with an R session

Using R interactively within VSCode is made possible through the R session watcher feature found in the vscode-R extension, which allows the editor to communicate with a live R session across a range of scenarios. For a smoother experience, using radian as the default R console is recommended, as it offers syntax highlighting, auto-completion and improved handling of code chunks and unicode characters.

When connecting to a remote server via SSH and managing R sessions within a tmux window, several configuration options help ensure that code is always sent to the active terminal and that sessions persist beyond the editor closing. Multiple R sessions can be run simultaneously across different tmux windows and safely restored when needed.

Document formatting based on styler is available but can be slow for larger scripts, so disabling automatic formatting on save and running it manually on demand is a practical alternative. The overall setup supports a wide range of interactive R features, including viewing data frames, global environments, functions, vectors and objects, as well as displaying plots, htmlwidgets, Shiny applications and code profiling results, alongside session symbol completion and help documentation.

  • The content, images, and materials on this website are protected by copyright law and may not be reproduced, distributed, transmitted, displayed, or published in any form without the prior written permission of the copyright holder. All trademarks, logos, and brand names mentioned on this website are the property of their respective owners. Unauthorised use or duplication of these materials may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties.

  • All comments on this website are moderated and should contribute meaningfully to the discussion. We welcome diverse viewpoints expressed respectfully, but reserve the right to remove any comments containing hate speech, profanity, personal attacks, spam, promotional content or other inappropriate material without notice. Please note that comment moderation may take up to 24 hours, and that repeatedly violating these guidelines may result in being banned from future participation.

  • By submitting a comment, you grant us the right to publish and edit it as needed, whilst retaining your ownership of the content. Your email address will never be published or shared, though it is required for moderation purposes.