Procedural programming languages

Online R programming books that are worth bookmarking

23^rd March 2026

As part of making content more useful following its reorganisation, numerous articles on the R statistical computing language have appeared on here. All of those have taken a more narrative form. With this collation of online books on the R language, I take a different approach. What you find below is a collection of links with associated descriptions. While narrative accounts can be very useful, there is something handy about running one's eye down a compilation as well. Many entries have a corresponding print edition, some of which are not cheap to buy, which makes me wonder about the economics of posting the content online as well, though it can help with getting feedback during book preparation.

Big Book of R

We start with this comprehensive collection of over 400 free and affordable resources related to the R programming language, organised into categories such as data science, statistics, machine learning and specific fields like economics and life sciences. In many ways, it is a superset of what you find below and complements this collection with many other finds. The fact that it is a living collection makes it even more useful.

R Programming for Data Science

Here is an introduction to the R programming language, focusing on its application in data science. It covers foundational topics such as installation, data manipulation, function writing, debugging and code optimisation, alongside advanced concepts like parallel computation and data analysis case studies. The text includes practical guidance on handling data structures, using packages such as {dplyr} and {readr} as well as working with dates, times and regular expressions. Additional sections address control structures, scoping rules and profiling techniques, while the author also discusses resources for staying updated through a podcast and accessing e-book versions for ongoing revisions.

Hands-On Programming with R

Designed for individuals with no prior coding experience, the book provides an introduction to programming in R while using practical examples to teach fundamental concepts such as data manipulation, function creation and the use of R's environment system. It is structured around hands-on projects, including simulations of weighted dice, playing cards and a slot machine, alongside explanations of core programming principles like objects, notation, loops and performance optimisation. Additional sections cover installation, package management, data handling and debugging techniques. While the book is written using RMarkdown and published under a Creative Commons licence, a physical edition is available through O’Reilly.

Advanced R

What you have here is one of several books written by Hadley Wickham. This one is published in its second edition as part of Chapman and Hall's R Series and is aimed primarily at R users who want to deepen their programming skills and understanding of the language, though it is also useful for programmers migrating from other languages. The book covers a broad range of topics organised into sections on foundations, functional programming, object-oriented programming, metaprogramming and techniques, with the latter including debugging, performance measurement and rewriting R code in C++.

Cookbook for R

Unlike Paul Teetor's separately published R Cookbook, the Cookbook for R was created by Winston Chang. It offers solutions to common tasks and problems in data analysis, covering topics such as basic operations, numbers, strings, formulas, data input and output, data manipulation, statistical analysis, graphs, scripts and functions, and tools for experiments.

R for Data Science

The second edition of R for Data Science by Hadley Wickham, Mine Çetinkaya-Rundel and Garrett Grolemund offers a structured approach to learning data science with R, covering essential skills such as data visualisation, transformation, import, programming and communication. Organised into chapters that explore workflows, data manipulation techniques and tools like Quarto for reproducible research, the book emphasises practical applications and best practices for handling data effectively.

R Graphics Cookbook

The R Graphics Cookbook, 2nd edition, offers a comprehensive guide to creating visualisations in R, structured into chapters that cover foundational skills such as installing and using packages, loading data from various formats and exploring datasets through basic plots. It progresses to detailed techniques for constructing bar graphs, line graphs, scatter plots and histograms, alongside methods for customising axes, annotations, themes and legends.

The book also addresses advanced topics like colour application, faceting data into subplots, generating specialised graphs such as network diagrams and heat maps and preparing data for visualisation through reshaping and summarising. Additional sections focus on refining graphical outputs for presentation, including exporting to different file formats and adjusting visual elements for clarity and aesthetics, while an appendix provides an overview of the {ggplot2} system.

R Markdown: The Definitive Guide

Published by Chapman & Hall/CRC, R Markdown: The Definitive Guide by Yihui Xie, J.J. Allaire and Garrett Grolemund covers the R Markdown document format, which has been in use since 2012 and is built on the knitr and Pandoc tools. The format allows users to embed code within Markdown documents and compile the results into a range of output formats including PDF, HTML and Word. The guide covers a broad scope of practical applications, from creating presentations, dashboards, journal articles and books to building interactive applications and generating blogs, reflecting how the ecosystem has matured since the {rmarkdown} package was first released in 2014.

A key principle running throughout is that Markdown's deliberately limited feature set is a strength rather than a drawback, encouraging authors to focus on content rather than complex typesetting. Despite this simplicity, the format remains highly customisable through tools such as Pandoc templates, LaTeX and CSS. Documents produced in R Markdown are also notably portable, as their straightforward syntax makes conversion between output formats more reliable, and because results are generated dynamically from code rather than entered manually, they are far more reproducible than those produced through conventional copy-and-paste methods.

R Markdown Cookbook

The R Markdown Cookbook is a practical guide designed to help users enhance their ability to create dynamic documents by combining analysis and reporting. It covers essential topics such as installation, document structure, formatting options and output formats like LaTeX, HTML and Word, while also addressing advanced features such as customisations, chunk options and integration with other programming languages. The book provides step-by-step solutions to common tasks, drawing on examples from online resources and community discussions to offer clear, actionable advice for both new and experienced users seeking to improve their workflow and explore the full potential of R Markdown.

RMarkdown for Scientists

This book provides a practical guide to using R Markdown for scientists, developed from a three-hour workshop and designed to evolve as a living resource. It covers essential topics such as setting up R Markdown documents, integrating with RStudio for efficient workflows, exporting outputs to formats like PDF, HTML and Word, managing figures and tables with dynamic references and captions, incorporating mathematical equations, handling bibliographies with citations and style adjustments, troubleshooting common issues and exploring advanced R Markdown extensions.

bookdown: Authoring Books and Technical Documents with R Markdown

Here is a guide to using the {bookdown} package, which extends R Markdown to facilitate the creation of books and technical documents. It covers Markdown syntax, integration of R code, formatting options for HTML, LaTeX and e-book outputs and features such as cross-referencing, custom blocks and theming. The package supports both multipage and single-document outputs, and its applications extend beyond traditional books to include course materials, manuals and other structured content. The work includes practical examples, publishing workflows and details on customisation, alongside information about licensing and the availability of a printed version.

[blogdown]: Creating Websites with R Markdown

Though the authors note that some information may be outdated due to recent updates to Hugo and the {blogdown} package, and they direct readers to additional resources for the latest features and changes, this book still provides a guide to building static websites using R Markdown and the Hugo static site generator, emphasising the advantages of this approach for creating reproducible, portable content. It covers installation, configuration, deployment options such as Netlify and GitHub Pages, migration from platforms like WordPress and advanced topics including custom layouts and version control as well as practical examples, workflow recommendations and discussions on themes, content management and technical aspects of website development.

[pagedown]: Create Paged HTML Documents for Printing from R Markdown

The R package {pagedown} enables users to create paged HTML documents suitable for printing to PDF, using R Markdown combined with a JavaScript library called paged.js, that later of which implements W3C specifications for paged media. While tools like LaTeX and Microsoft Word have traditionally dominated PDF production, pagedown offers an alternative approach through HTML and CSS, supporting a range of document types including resumes, posters, business cards, letters, theses and journal articles.

Documents can be converted to PDF via Google Chrome, Microsoft Edge or Chromium, either manually or through the chrome_print() function, with additional support for server-based, CI/CD pipeline and Docker-based workflows. The package provides customisable CSS stylesheets, a CSS overriding mechanism for adjusting fonts and page properties, and various formatting features such as lists of tables and figures, abbreviations, footnotes, line numbering, page references, cover images, running headers, chapter prefixes and page breaks. Previewing paged documents requires a local or remote web server, and the layout is sensitive to browser zoom levels, with 100% zoom recommended for the most accurate output.

Dynamic Documents with R and knitr

Developed by Yihui Xie and inspired by the earlier {Sweave} package, {knitr} is an R package designed for dynamic report generation that consolidates the functionality of numerous other add-on packages into a single, cohesive tool. It supports multiple input languages, including R, Python and shell scripts, as well as multiple output markup languages such as LaTeX, HTML, Markdown, AsciiDoc and reStructuredText. The package operates on a principle of transparency, giving users full control over how input and output are handled, and runs R code in a manner consistent with how it would behave in a standard R terminal.

Among its notable features are built-in caching, automatic code formatting via the {formatR} package, support for more than 20 graphics devices and flexible options for managing plots within documents. It also allows advanced users to define custom hooks and regular expressions to extend and tailor its behaviour further. The package is affiliated with the Foundation for Open Access Statistics, a nonprofit organisation promoting free software, open access publishing and reproducible research in statistics.

Mastering Shiny

Mastering Shiny is a comprehensive guide to developing web applications using R, focusing on the Shiny framework designed for data scientists. It introduces core concepts such as user interface design, reactive programming and dynamic content generation, while also exploring advanced topics like performance optimisation, security and modular app development. The book covers practical applications across industries, from academic teaching tools to real-time analytics dashboards, and aims to equip readers with the skills to build scalable, maintainable applications. It includes detailed chapters on workflow, layout, visualisation and user interaction, alongside case studies and technical best practices.

Engineering Production-Grade Shiny Apps

This is aimed at developers and team managers who already possess a working knowledge of the Shiny framework for R and wish to advance beyond the basics toward building robust, production-ready applications. Rather than covering introductory Shiny concepts or post-deployment concerns, the book focuses on the intermediate ground between those two stages, addressing project management, workflow, code structure and optimisation.

It introduces the {golem} package as a central framework and guides readers through a five-step workflow covering design, prototyping, building, strengthening and deployment, with additional chapters on optimisation techniques including R code performance, JavaScript integration and CSS. The book is structured to serve both those with project management responsibilities and those focused on technical development, acknowledging that in many small teams these roles are carried out by the same individual.

Outstanding User Interfaces with Shiny

Written by David Granjon and published in 2022, Outstanding User Interfaces with Shiny is a book aimed at filling the gap between beginner and advanced Shiny developers, covering how to deeply customise and enhance Shiny applications to the point where they become indistinguishable from classic web applications. The book spans a wide range of topics, including working with HTML and CSS, integrating JavaScript, building Bootstrap dashboard templates, mobile development and the use of React, providing a comprehensive resource that consolidates knowledge and experience previously scattered across the Shiny developer community.

R Packages

Now in its second edition, R Packages by Hadley Wickham and Jennifer Bryan is a freely available online guide that teaches readers how to develop packages in R. A package is the core unit of shareable and reproducible R code, typically comprising reusable functions, documentation explaining how to use them and sample data. The book guides readers through the entire process of package development, covering areas such as package structure, metadata, dependencies, testing, documentation and distribution, including how to release a package to CRAN. The authors encourage a gradual approach, noting that an imperfect first version is perfectly acceptable provided each subsequent version improves on the last.

Mastering Spark with R

Written by Javier Luraschi, Kevin Kuo and Edgar Ruiz, Mastering Spark with R is a comprehensive guide designed to take readers from little or no familiarity with Apache Spark or R through to proficiency in large-scale data science. The book covers a broad range of topics, including data analysis, modelling, pipelines, cluster management, connections, data handling, performance tuning, extensions, distributed computing, streaming and contributing to the Spark ecosystem.

Happy Git and GitHub for the useR

Here is a practical guide written by Jenny Bryan and contributors, aimed primarily at R users involved in data analysis or package development. It covers the installation and configuration of Git alongside GitHub, the development of key workflows for common tasks and the integration of these tools into day-to-day work with R and R Markdown. The guide is structured to take readers from initial setup through to more advanced daily workflows, with particular attention paid to how Git and GitHub serve the needs of data science rather than pure software development.

JavaScript for R

Written by John Coene and intended for release as part of the CRC Press R series, JavaScript for R explore how the R programming language and JavaScript can be used together to enhance data science workflows. Rather than teaching JavaScript as a standalone language, the book demonstrates how a limited working knowledge of it can meaningfully extend what R developers can achieve, particularly through the integration of external JavaScript libraries.

The book covers a broad range of topics, progressing from foundational concepts through to data visualisation using the {htmlwidgets} package, bidirectional communication with Shiny, JavaScript-powered computations via the V8 engine and Node.js and the use of modern JavaScript tools such as Vue, React and webpack alongside R. Practical examples are woven throughout, including the building of interactive visualisations, custom Shiny inputs and outputs, image classification and machine learning operations, with all accompanying code made publicly available on GitHub.

HTTP Testing in R

This guide addresses challenges faced by developers of R packages that interact with web resources, offering strategies to create reliable unit tests despite dependencies on internet connectivity, authentication and external service availability. It explores tools such as {vcr}, {webmockr}, {httptest} and {webfakes}, which enable mocking and recording HTTP requests to ensure consistent testing environments, reduce reliance on live data and improve test reliability. The text also covers advanced topics like handling errors, securing tests and ensuring compatibility with CRAN and Bioconductor, while emphasising best practices for maintaining test robustness and contributor-friendly workflows. Funded by rOpenSci and the R Consortium, the resource aims to support developers in building more resilient and maintainable R packages through structured testing approaches.

The Shiny AWS Book

The Shiny AWS Book is an online resource designed to teach data scientists how to deploy, host and maintain Shiny web applications using cloud infrastructure. Addressing a common gap in data science education, it guides readers through a range of DevOps technologies including AWS, Docker, Git, NGINX and open-source Shiny Server, covering everything from server setup and cost management to networking, security and custom configuration.

{ggplot2}: Elegant Graphics for Data Analysis

The third edition of {ggplot2}: Elegant Graphics for Data Analysis provides an in-depth exploration of the Grammar of Graphics framework, focusing on the theoretical foundations and detailed implementation of the ggplot2 package rather than offering step-by-step instructions for specific visualisations. Written by Hadley Wickham, Danielle Navarro and Thomas Lin Pedersen, the book is presented as an online work-in-progress, with content structured across sections such as layers, scales, coordinate systems and advanced programming topics. It aims to equip readers with the knowledge to customise plots according to their needs, rather than serving as a direct guide for creating predefined graphics.

YaRrr! The Pirate’s Guide to R

Written by Nathaniel D. Phillips, this is a beginner-oriented guide to learning the R programming language from the ground up, covering everything from installation and basic navigation of the RStudio environment through to more advanced topics such as data manipulation, statistical analysis and custom function writing. The guide progresses logically through foundational concepts including scalars, vectors, matrices and dataframes before moving into practical areas such as hypothesis testing, regression, ANOVA and Bayesian statistics. Visualisation is given considerable attention across dedicated chapters on plotting, while later sections address loops, debugging and managing data from a variety of file formats. Each chapter includes practical exercises to reinforce learning, and the book concludes with a solutions section for reference.

Data Visualisation: A Practical Introduction

Data Visualisation: A Practical Introduction is a forthcoming second edition from Princeton University Press, written by Kieran Healy and due for release in March 2026, which teaches readers how to explore, understand and present data using the R programming language and the {ggplot2} library. The book aims to bridge the gap between works that discuss visualisation principles without teaching the underlying tools and those that provide code recipes without explaining the reasoning behind them, instead combining both practical instruction and conceptual grounding.

Revised and updated throughout to reflect developments in R and {ggplot2}, the second edition places greater emphasis on data wrangling, introduces updated and new datasets, and substantially rewrites several chapters, particularly those covering statistical models and map-drawing. Readers are guided through building plots progressively, from basic scatter plots to complex layered graphics, with the expectation that by the end they will be able to reproduce nearly every figure in the book and understand the principles that inform each choice.

The book also addresses the growing role of large language models in coding workflows, arguing that genuine understanding of what one is doing remains essential regardless of the tools available. It is suitable for complete beginners, those with some prior R experience, and instructors looking for a course companion, and requires the installation of R, RStudio and a number of supporting packages before work can begin.

Some PowerShell fundamentals for practical automation

27^th October 2025

In the last few months, I have taken to using PowerShell for automating tasks while working on a new contract. There has been an element of vibe programming some of the scripts, which is why I wished to collate a reference guide that anyone can have to hand. While working with PowerShell every day does help to reinforce the learning, it also helps to look up granular concepts on a more bite-sized level. This especially matters given PowerShell's object-oriented approach. After all, many of us build things up iteratively from little steps, which also allows for more flexibility. Using an AI is all very well, yet the fastest recall is always from your on head.

1. Variables and Basic Data Types

Variables start with a dollar sign and hold values you intend to reuse, so names like $date, $outDir and $finalDir become anchors for later operations. Dates are a frequent companion in filenames and logs, and PowerShell's Get-Date makes this straightforward. A format string such as Get-Date -Format "yyyy-MM-dd" yields values like 2025-10-27, while Get-Date -Format "yyyy-MM-dd HH:mm:ss" adds a precise timestamp that helps when tracing the order of events. Because these commands return text when a format is specified, you can stitch the results into other strings without fuss.

2. File System Operations

As soon as you start handling files, you will meet a cluster of commands that make navigation robust rather than fragile. Join-Path assembles folder segments without worrying about stray slashes, Test-Path checks for the existence of a target, and New-Item creates folders when needed. Moving items with Move-Item keeps the momentum going once the structure exists.

Environment variables give cross-machine resilience; reading $env:TEMP finds the system's temporary area, and [Environment]::GetFolderPath("MyDocuments") retrieves a well-known Windows location without hard-coding. Setting context helps too, so Set-Location acts much like cd to make a directory the default focus for subsequent file operations. You can combine these approaches, as in cd ([Environment]::GetFolderPath("MyDocuments")), which navigates directly to the My Documents folder without hard-coded paths.

Scripts are often paired with nearby files, and Split-Path $ScriptPath -Parent extracts a parent folder from a full path so you can create companions in the same place. Network locations behave like local ones, with Universal Naming Convention paths beginning \ supported throughout, and Windows paths do not require careful case matching because the file systems are generally case-insensitive, which differs from many Unix-based systems. Even simple details matter, so constructing strings such as "$Folder*" is enough for wildcard searches, with backslashes treated correctly and whitespace handled sensibly.

3. Arrays and Collections

Arrays are created with @() and make it easy to keep related items together, whether those are folders in a $locs array or filenames gathered into $progs1, $progs2 and others. Indexing retrieves specific positions with square brackets, so $locs[0] returns the first entry, and a variable index like $outFiles[$i] supports loop counters.

A single value can still sit in an array using syntax such as @("bm_rc_report.sas"), which keeps your code consistent when functions always expect sequences. Any collection advertises how many items it holds using the Count property, so checking $files.Count equals zero tells you whether there is anything to process.

4. Hash Tables

When you need fast lookups, a hash table works as a dictionary that associates keys and values. Creating one with @{$locs[0] = $progs1} ties the first location to its corresponding programme list and then $locsProgs[$loc] retrieves the associated filenames for whichever folder you are considering. This is a neat stepping stone to loops and conditionals because it organises data around meaningful relationships rather than leaving you to juggle parallel arrays.

5. Control Flow

Control flow is where scripts begin to feel purposeful. A foreach loop steps through the items in a collection and is comfortable with nested passes, so you might iterate through folders, then the files inside each folder, and then a set of search patterns for those files. A for loop offers a counting pattern with initialisation, a condition and an increment written as for ($i = 0; $i -lt 5; $i++). It differs from foreach by focusing on the numeric progression rather than the items themselves.

Counters are introduced with $i = 0 and advanced with $i++, which in turn blends well with array indexing. Conditions gate work to what needs doing. Patterns such as if (-not (Test-Path ...)) reduce needless operations by creating folders only when they do not exist, and an else branch can note alternative outcomes, such as a message that a search pattern was not found.

Sometimes there is nothing to gain from proceeding, and break exits the current loop immediately, which is an efficient way to stop retrying once a log write succeeds. At other times it is better to skip just the current iteration, and continue moves directly to the next pass, which proves useful when a file list turns out to be empty for a given pattern.

6. String Operations

Strings support much of this work, so several operations are worth learning well. String interpolation allows variables to be embedded inside text using "$variable" or by wrapping expressions as "$($expression)", which becomes handy when constructing paths like "psoutput$($date)".

Splitting text is as simple as -split, and a statement such as $stub, $type = $File -split "." divides a filename around its dot, assigning the parts to two variables in one step. This demonstrates multiple variable assignment, where the first part goes to $stub and the second to $type, allowing you to decompose strings efficiently.

When transforming text, the -Replace operator substitutes all occurrences of one pattern with another, and you can chain replacements, as in -replace $Match, $Replace -replace $Match2, $Replace2, so each change applies to the modified output of the previous one.

Building new names clearly is easier with braces, as in "${stub}_${date}.txt", which prevents ambiguity when variable names abut other characters. Escaping characters is sometimes needed, so using "." treats a dot as a literal in a split operation. The backtick character ` serves as PowerShell's escape character and introduces special sequences like a newline written as `n, a tab as `t and a carriage return as `r. When you need to preserve formatting across lines without worrying about escapes, here-strings created with @" ... "@ keep indentation and line breaks intact.

7. Pipeline Operations

PowerShell's pipeline threads operations together so that the output of one command flows to the next. The pipe character | links each stage, and commands such as ForEach-Object (which processes each item), Where-Object (which filters items based on conditions) and Sort-Object -Unique (which removes duplicates) become building blocks that shape data progressively.

Within these blocks, the current item appears as $_, and properties exposed by commands can be read with syntax like $_.InputObject or $_.SideIndicator, the latter being especially relevant when handling comparison results. With pipeline formatting, you can emit compact summaries, as in ForEach-Object { "$($_.SideIndicator) $($_.InputObject)" }, which brings together multiple properties into a single line of output.

A multi-stage pipeline filtering approach often follows three stages: Select-String finds matches, ForEach-Object extracts only the values you need, and Where-Object discards anything that fails your criteria. This progressive refinement lets you start broad and narrow results step by step. There is no compulsion to over-engineer, though; a simplified pipeline might omit filtering stages if the initial search is already precise enough to return only what you need.

8. Comparison and Matching

Behind many of these steps sit comparison and matching operators that extend beyond simple equality. Pattern matching appears through -notmatch, which uses regular expressions to decide whether a value does not fit a given pattern, and it sits alongside -eq, -ne and -lt for equality, inequality and numeric comparison.

Complex conditions chain with -and, so an expression such as $_ -notmatch '^%macro$' -and $_ -notmatch '^%mend$' ensures both constraints are satisfied before an item passes through. Negative matching in particular helps exclude unwanted lines while leaving the rest untouched.

9. Regular Expressions

Regular expressions define patterns that match or search for text, often surfacing through operators such as -match and -replace. Simple patterns like .log$ identify strings ending with .log, while more elaborate ones capture groups using parentheses, as in (sdtm.|adam.), which finds two alternative prefixes.

Anchors matter, so ^ pins a match to the start of a line and $ pins it to the end, which is why ^%macro$ means an entire line consists of nothing but %macro. Character classes provide shortcuts such as w for word characters (letters, digits or underscores) and s for whitespace. The pattern "GRCw*" matches "GRC" followed by zero or more word characters, demonstrating how * controls repetition. Other quantifiers like + (one or more) and ? (zero or one) offer further control.

Escaping special characters with a backslash turns them into literals, so . matches a dot rather than any character. More complex patterns like '%(m|v).*?(?=[,(;s])' combine alternation with non-greedy matching and lookaheads to define precise search criteria.

When working with matches in pipelines, $_.Matches.Value extracts the actual text that matched the pattern, rather than returning the entire line where the match was found. This proves essential when you need just the matching portion for further processing. The syntax can appear dense at first, but PowerShell's integration means you can test patterns quickly within a pipeline or conditional, refining as you go.

10. File Content Operations

Searching file content with Select-String applies regular expressions to lines and returns match objects, while Out-File writes text to files with options such as -Append and -Encoding UTF8 to control how content is persisted.

11. File and Directory Searching

Commands for locating files typically combine path operations with filters. Get-ChildItem retrieves items from a folder, and parameters like -Filter or -Include narrow results by pattern. Wildcards such as * are often enough, but regular expressions provide finer control when integrated with pipeline operations. Recursion through subdirectories is available with -Recurse, and combining these techniques allows you to find specific files scattered across a directory tree. Once items are located, properties like FullName, Name and LastWriteTime let you decide what to do next.

12. Object Properties

Objects exposed by commands carry properties that you can access directly. $File.FullName retrieves an absolute path from a file object, while names, sizes and modification timestamps are all available as well. Subexpressions introduced with $() evaluate an inner expression within a larger string or command, which is why $($File.FullName) is often seen when embedding property values in strings. However, subexpressions are not always required; direct property access works cleanly in many contexts. For instance, $File.FullName -Replace ... reads naturally and works as you would expect because the property access is unambiguous when used as a command argument rather than embedded within a string.

13. Output and Logging

Producing output that can be read later is easier if you apply a few conventions. Write-Output sends structured lines to the console or pipeline, while Write-Warning signals notable conditions without halting execution, a helpful way to flag missing files. There are times when command output is unnecessary, and piping to Out-Null discards it quietly, for example when creating directories. Larger scripts benefit from consistency and a short custom function such as Write-Log establishes a uniform format for messages, optionally pairing console output with a line written to a file.

14. Functions

Functions tie these pieces together as reusable blocks with a clear interface. Defining one with function Get-UniquePatternMatches { } sets the structure, and a param() block declares the inputs. Strongly typed parameters like [string[]] make it clear that a function accepts an array of strings, and naming parameters $Folder and $Pattern describes their roles without additional comments.

Functions are called using named parameters in the format Get-UniquePatternMatches -Folder $loc -Pattern '(sdtm.|adam.)', which makes the intent explicit. It is common to pass several arrays into similar functions, so a function might have many parameters of the same type. Using clear, descriptive names such as $Match, $Replace, $Match2 and $Replace2 leaves little doubt about intent, even if an array of replacement rules would sometimes reduce repetition.

Positional parameters are also available; when calling Do-Compare you can omit parameter names and rely on the order defined in param(). PowerShell follows verb-noun naming conventions for functions, with common verbs including Get, Set, New, Remove, Copy, Move and Test. Following this pattern, as in Multiply-Files, places your code in the mainstream of PowerShell conventions.

It is worth avoiding a common pitfall where a function declares param([string[]]$Files) but inadvertently reads a variable like $progs from outside the function. PowerShell allows this via scope inheritance, where functions can access variables from parent scopes, but it makes maintenance harder and disguises the function's true dependencies. Being explicit about parameters creates more maintainable code.

Simple functions can still do useful work without complexity. A minimal function implementation with basic looping and conditional logic can accomplish useful tasks, and a recurring structure can be reused with minor revisions, swapping one regular expression for another while leaving the looping and logging intact. Replacement chains are flexible; add as many -replace steps as are needed, and no more.

Parameters can be reused meaningfully too, demonstrating parameter reuse where a $Match variable serves double duty: first as a filename filter in -Include, then as a text pattern for -replace. Nested function calls tie output and input together, as when piping a here-string to Out-File (Join-Path ...) to construct a file path at the moment of writing.

15. Comments

Comments play a quiet but essential role. A line starting with # explains why something is the way it is or temporarily disables a line without deleting it, which is invaluable when testing and refining.

16. File Comparison

Comparison across datasets rounds out common tasks, and Compare-Object identifies differences between two sets, telling you which items are unique to each side or shared by both. Side indicators in the output are compact: <= shows the first set, => the second, and == indicates an item present in both.

17. Common Parameters

Across many commands, common parameters behave consistently. -Force allows operations that would otherwise be blocked and overwrites existing items without prompting in contexts that support it, -LiteralPath treats a path exactly as written without interpreting wildcards, and -Append adds content to existing files rather than overwriting them. These options smooth edges when you know what you want a command to do and prefer to avoid interactive questions or unintended pattern expansion.

18. Advanced Scripting Features

A number of advanced features make scripts sturdier. Automatic variables such as $MyInvocation.MyCommand.Path provide information about the running script, including its full path, which is practical for locating resources relative to the script file. Set-StrictMode -Version Latest enforces stricter rules that turn common mistakes into immediate errors, such as using an uninitialised variable or referencing a property that does not exist. Clearing the console at the outset with Clear-Host gives a clean slate for output when a script begins.

19. .NET Framework Integration

Integration with the .NET Framework extends PowerShell's reach, and here are some examples. For instance, calling [System.IO.Path]::GetFileNameWithoutExtension() extracts a base filename using a tested library method. To gain more control over file I/O, [System.IO.File]::Open() and System.IO.StreamWriter expose low-level handles that specify sharing and access modes, which can help when you need to coordinate writing without blocking other readers. File sharing options like [System.IO.FileShare]::Read allow other processes to read a log file while the script writes to it, reducing contention and surprises.

20. Error Handling

Error handling deserves a clear pattern. Wrapping risky operations in try { } catch { } blocks captures exceptions, so a script can respond gracefully, perhaps by writing a warning and moving on. A finally block can be added for clean-up operations that must run regardless of success or failure.

When transient conditions are expected, a retry logic pattern is often enough, pairing a counter with Start-Sleep to attempt an operation several times before giving up. Waiting for a brief period such as Start-Sleep -Milliseconds 200 gives other processes time to release locks or for temporary conditions to clear.

Alongside this, checking for null values keeps assumptions in check, so conditions like if ($null -ne $process) ensure that you only read properties when an object was created successfully. This defensive approach prevents cascading errors when operations fail to return expected objects.

21. External Process Management

Managing external programmes is a common requirement, and PowerShell's Start-Process offers a controlled route. Several parameters control its behaviour precisely:

The -Wait parameter makes PowerShell pause until the external process completes, essential for sequential processing where later steps depend on earlier ones. The -PassThru parameter returns a process object, allowing you to inspect properties like exit codes after execution completes. The -NoNewWindow parameter runs the external process in the current console rather than opening a new window, keeping output consolidated. If a command expects the Command Prompt environment, calling it via cmd.exe /c $cmd integrates cleanly, ensuring compatibility with programmes designed for the CMD shell.

Exit codes reported with $process.ExitCode indicate success with zero and errors with non-zero values in most tools, so checking these numbers preserves confidence in the sequence of steps. The script demonstrates synchronous execution, processing files one at a time rather than in parallel, which can be an advantage when dependencies exist between stages or when you need to ensure ordered completion.

22. Script Termination

Scripts need to finish in a way that other tools understand. Exiting with Exit 0 signals success to schedulers and orchestrators that depend on numeric codes, while non-zero values indicate error conditions that trigger alerts or retries.

Bringing It All Together

Because this is a granular selection, it leaves it to us to piece everything together to accomplish the tasks that we have to complete. In that way, we can embed the knowledge so that we are vibe coding all the time, ensuring that a more deterministic path is followed.

Keeping a graphical eye on CPU temperature and power consumption on the Linux command line

20^th March 2025

Following my main workstation upgrade in January, some extra monitoring has been needed. This follows on from the experience with building its predecessor more than three years ago.

Being able to do this in a terminal session keeps things lightweight, and I have done that with text displays like what you see below using a combination of sensors and nvidia-smi in the following command:

watch -n 2 "sensors | grep -i 'k10'; sensors | grep -i 'tdie'; sensors | grep -i 'tctl'; echo "" | tee /dev/fd/2; nvidia-smi"

Everything is done within a watch command that refreshes the display every two seconds. Then, the panels are built up by a succession of commands separated with semicolons, one for each portion of the display. The grep command is used to pick out the desired output of the sensors command that is piped to it; doing that twice gets us two lines. The next command, echo "" | tee /dev/fd/2, adds an extra line by sending a space to STDERR output before the output of nvidia-smi is displayed. The result can be seen in the screenshot below.

However, I also came across a more graphical way to do things using commands like turbostat or sensors along with AWK programming and ttyplot. Using the temperature output from the above and converting that needs the following:

while true; do sensors | grep -i 'tctl' | awk '{ printf("%.2f\n", $2); fflush(); }'; sleep 2; done | ttyplot -s 100 -t "CPU Temperature (Tctl)" -u "°C"

This is done in an infinite while loop to keep things refreshing; the watch command does not work for piping output from the sensors command to both the awk and ttyplot commands in sequence and on a repeating, periodic basis. The awk command takes the second field from the input text, formats it to two places of decimals and prints it before flushing the output buffer afterwards. The ttyplot command then plots those numbers on the plot seen below in the screenshot with a y-axis scaled to a maximum of 100 (-s), units of °C (-u) and a title of CPU Temperature (Tctl) (-t).

A similar thing can be done for the CPU wattage, which is how I learned of the graphical display possibilities in the first place. The command follows:

sudo turbostat --Summary --quiet --show PkgWatt --interval 1 | sudo awk '{ printf("%.2f\n", $1); fflush(); }' | sudo ttyplot -s 200 -t "Turbostat - CPU Power (watts)" -u "watts"

Handily, the turbostat can be made to update every so often (every second in the command above), avoiding the need for any infinite while loop. Since only a summary is needed for the wattage, all other output can be suppressed, though everything needs to work using superuser privileges, unlike the sensors command earlier. Then, awk is used like before to process the wattage for plotting; the first field is what is being picked out here. After that, ttyplot displays the plot seen in the screenshot below with appropriate title, units and scaling. All works with output from one command acting as input to another using pipes.

All of this offers a lightweight way to keep an eye on system load, with the top command showing the impact of different processes if required. While there are graphical tools for some things, command line possibilities cannot be overlooked either.

Upgrading Julia packages

23^rd January 2024

Whenever a new version of Julia is released, I have certain actions to perform. With Julia 1.10, installing and updating it has become more automated thanks to shell scripting or the use of WINGET, depending on your operating system. Because my environment predates this, I found that the manual route still works best for me, and I will continue to do that.

Returning to what needs doing after an update, this includes updating Julia packages. In the REPL, this involves dropping to the PKG subshell using the ] key if you want to avoid using longer commands or filling your history with what is less important for everyday usage.

Previously, I often ran code to find a package was missing after updating Julia, so the add command was needed to reinstate it. That may raise its head again, but there also is the up command for upgrading all packages that were installed. This could be a time saver when only a single command is needed for all packages and not one command for each package as otherwise might be the case.

Rendering Markdown into HTML using PHP

3^rd December 2022

One of the good things about using virtual private servers for hosting websites instead of shared hosting or using a web application service like WordPress.com or Tumblr is that you get added control and flexibility. There was a time when HTML, CSS and client-side scripting were all that was available from the shared hosting providers that I was using back then. Then, static websites were my lot until it became possible to use Perl server side scripting. PHP predominates now, but Python or Ruby cannot be discounted either.

Being able to install whatever you want is a bonus as well, though it means that you also are responsible for the security of the containers that you use. There will be infrastructure security, but that of your own machine will be your own concern. Added power always means added responsibility, as many might say.

The reason that these thought emerge here is that getting PHP to render Markdown as HTML needs the installation of Composer. Without that, you cannot use the CommonMark package to do the required back-work. All the command that you see here will work on Ubuntu 22.04. First, you need to download Composer and executing the following command will accomplish this:

curl https://getcomposer.org/installer -o /tmp/composer-setup.php

Before the installation, it does no harm to ensure that all is well with the script before proceeding. That means that capturing the signature for the script using the following command is wise:

HASH=`curl https://composer.github.io/installer.sig`

Once you have the script signature, then you can check its integrity using this command:

php -r "if (hash_file('SHA384', '/tmp/composer-setup.php') === '$HASH') { echo 'Installer verified'; } else { echo 'Installer corrupt'; unlink('composer-setup.php'); } echo PHP_EOL;"

The result that you want is "Installer verified". If not, you have some investigating to do. Otherwise, just execute the installation command:

sudo php /tmp/composer-setup.php --install-dir=/usr/local/bin --filename=composer

With Composer installed, the next step is to run the following command in the area where your web server expects files to be stored. That is important when calling the package in a PHP script.

composer require league/commonmark

Then, you can use it in a PHP script like so:

define("ROOT_LOC",$_SERVER['DOCUMENT_ROOT']); include ROOT_LOC . '/vendor/autoload.php'; use League\CommonMark\CommonMarkConverter; $converter = new CommonMarkConverter(); echo $converter->convertToHtml(file_get_contents(ROOT_LOC . '<location of markdown file>));

The first line finds the absolute location of your web server file directory before using it when defining the locations of the autoload script and the required markdown file. The third line then calls in the CommonMark package, while the fourth sets up a new object for the desired transformation. The last line converts the input to HTML and outputs the result.

If you need to render the output of more than one Markdown file, then repeating the last line from the preceding block with a different file location is all you need to do. The CommonMark object persists and can be used like a variable without needing the reinitialisation to be repeated every time.

The idea of building a website using PHP to render Markdown has come to mind, but I will leave it at custom web pages for now. If an opportunity comes, then I can examine the idea again. Before, I had to edit HTML, but Markdown is friendlier to edit, so that is a small advance for now.

Using multi-line commenting in Perl to inactivate blocks of code during testing

26^th December 2019

Recently, I needed to inactivate blocks of code in a Perl script while doing some testing. Since this is something that I often do in other computing languages, I sought the same in Perl. To accomplish that, I need to use the POD methodology. This meant enclosing the code as follows.

=start

<< Code to be inactivated by inclusion in a comment >>

=cut

While the =start line could use any word after the equality sign, it seems that =cut is required to close the multi-line comment. If this was actual programming documentation, then the comment block should include some meaningful text for use with perldoc. However, that was not a concern here because the commenting statements would be removed afterwards anyway. It also is good practice not to leave commented code in a production script or program to avoid any later confusion.

In my case, this facility allowed me to isolate the code that I had to alter and test before putting everything back as needed. It also saved time since I did not need to individually comment out every executable line because multiple lines could be inactivated at a time.

Using the IN operator in SAS Macro programming

8^th October 2012

This useful addition came in SAS 9.2, and I am amazed that it isn’t enabled by default. To accomplish that, you need to set the MINOPERATOR option, unless someone has done it for you in the SAS AUTOEXEC or another configuration program. Thus, the safety first approach is to have code like the following:

options minoperator;

%macro inop(x);
    %if &x in (a b c) %then %do;
        %put Value is included;
    %end;
    %else %do;
        %put Value not included;
    %end;
%mend inop;

%inop(a);

Also, the default delimiter is the space, so if you need to change that, then the MINDELIMITER option needs setting. Adjusting the above code so that the delimiter now is the comma character gives us the following:

options minoperator mindelimiter=",";

%macro inop(x);
    %if &x in (a b c) %then %do;
        %put Value is included;
    %end;
    %else %do;
        %put Value not included;
    %end;
%mend inop;

%inop(a);

Without any of the above, the only approach is to have the following, and that is what we had to do for SAS versions before 9.2:

%macro inop(x);
    %if &x=a or &x=b or &x=c %then %do;
        %put Value is included;
    %end;
    %else %do;
        %put Value not included;
    %end;
%mend inop;

%inop(a);

While it may be clunky, it does work and remains a fallback in newer versions of SAS. Saying that, having the IN operator available makes writing SAS Macro code that little bit more swish, so it's a good thing to know.

On Making PROC REPORT Work Harder

1^st September 2010

In the early years of my SAS programming career, there seemed to be just the one procedure to use if you wanted to create a summary table. That was TABULATE and it was great for generating columns according to the value of a variable such as the treatment received by a subject in a clinical study. To a point, it could generate statistics for you too, and I often used it to sum frequency and percentage variables. Since then, it seems to have been enhanced a little and surprised me with the statistics it could produce when I had a recent play. Here's the code:

proc tabulate data=sashelp.class;
    class sex;
    var age;
    table age*(n median*f=8. mean*f=8.1 std*f=8.1 min*f=8. max*f=8. lclm*f=8.1 uclm*f=8.1),sex
	  / misstext="0";
run;

When you compare that with the idea of creating one variable per column and then defining them in PROC REPORT as many do, it looks more elegant and the results aren't bad either, though they can be tweaked further from the quick example that I generated. That last comment brings me to the point that PROC REPORT seems to have taken over from TABULATE wherever I care to look these days, and I do ask myself if it is the right tool for that for which it is being used or if it is being used in the best way.

While using Data Step to create one variable per column in a PROC REPORT output doesn't strike me as the best way to write reusable code, there are ways to make PROC REPORT do more for you. For example, by defining GROUP, ACROSS and ANALYSIS columns in an output, you can persuade the procedure to do the summarising for you and there's some example code below with the comma nesting height under sex in the resulting table. Sums are created by default if you do this, and forgoing an analysis column definition means that you get a frequency table, not at all a useless thing in numerous instances.

proc report data=sashelp.class nowd missing;
    columns age sex,height;
    define age / group "Age";
    define sex / across "Sex";
    define height / analysis mean f=missing. "Mean Height";
run;

For those times when you need to create more heavily formatted statistics (summarising range as min-max rather showing min and max separately, for example), you might feel that the GROUP/ACROSS set-up's non-display of character values puts a stop to using that approach. However, I found that making every value combination unique and attaching a cell ID helps to work around the problem. Then, you can create a format control data set from the data like in the code below and create a format from that which you can apply to the cell ID's to display things as you need them. This method does make things more portable from situation to situation than adding or removing columns depending on the values of a classification variable.

proc sql noprint;
    create table cntlin as
        select distinct "fmtname" as fmtname, cellid as start, cellid as end, decode as label
            from report;
quit;

proc format lib=work cntlin=cnlin; run;

Understanding Perl binding operators for pattern matching

20^th May 2009

While this piece is as much an aide de memoire for myself as anything else, putting it here seems worthwhile if it answers questions for others. The binding operators, =~ or !~, come in handy when you are framing conditional statements in Perl using Regular Expressions, for example, testing whether x =~ /\d+/ or not. The =~ variant is also used for changing strings using the s/[pattern1]/[pattern2]/ regular expression construct (here, s stands for "substitute"). What has brought this to mind is that I wanted to ensure that something was done for strings that did not contain a certain pattern, and that's where the !~ binding operator came in useful; ^~ might have come to mind for some reason, but it wasn't what I needed.

When web hosting limits become too restrictive

27^th April 2009

Fasthosts has, in their wisdom, decided to limit the execution time for ASP scripts to 15 seconds and 10 seconds for any others. I haven't used Perl sufficiently in this shared hosting set up to determine how that is affected. In contrast, I can share my experiences on the PHP side and you may have noticed occasional glitches. They have also disabled the set_time_limit PHP function, so you cannot easily address the matter yourself where you need to do it. You almost get the feeling that they don't trust the abilities, actions and oversight of their users. Personally, I reckon that the ten-second limit is too short and that something of the order of 20 or 30 seconds would be better. If it all gets too restrictive, I suppose that there are other providers, though I think that I would avoid resellers after a previous less than glorious experience. There's the dedicated server option too if I was feeling flush, not so likely given the economic times in which we live.

« Older Entries «