Business intelligence software

Cross-platform file and directory renaming in SAS with added return code handling

30^th March 2026

Here is another post on operating system-level actions performed from within SAS for the sake of being both robust and cross-platform. File deletion has been one example, so here is file and directory renaming. That got used while doing some debugging as well as a piece of validation testing.

The rename function applies to more than files, though, which means that the "FILE" parameter (in quotes below) is as essential as those for source and target file paths. That tripped me up when I went about doing it for the first time. All the action happens within a data step:

data _null_;
	rc = rename("&file_name..xlsx", "&file_name..bak", "FILE");
	if rc = 0 then
		putlog "File renamed successfully";
	else;
		putlog "Rename failed";
		msg = sysmsg();
		putlog "SYSMSG: " msg;
run;

Here, we have a null data step, which means that no output is written to disk. We also capture the return code for the operation in a variable called rc. If that has a value of 0, the operation has completed successfully, while the alternative can have extra information that can be captured by the sysmsg function, as is seen above. Putting this all together, not only can you issue the result of the operation in the log using putlog statements, but also extra information useful for debugging what happened and fixing it.

As has been alluded earlier, the rename function can be used with other objects too. Instead of "file", you can have these for those: "ACCESS" for an access descriptor that was created using SAS/ACCESS software, "CATALOG" for a SAS catalog or catalog entry, 'DATA' for a SAS table or dataset (which is the default and explains how I got caught out as mentioned earlier), "VIEW" for a SAS table view. That makes it more useful, even if the DATASETS procedure is worth checking out for some of these.

Some R packages to explore as you find your feet with the language

24^th March 2026

Here are some commonly used R packages and other tools that are pervasive, along with others that I have encountered while getting started with the language, itself becoming pervasive in my line of business. The collection grew organically as my explorations proceeded, and reflects what I was trying out during my acclimatisation.

General

Here are two general packages to get things started, with one of them being unavoidable in the R world. The other is more advanced, possibly offering more to package developers.

{tidyverse}

You cannot use R without knowing about this collection of packages. In many ways, they form a mini-language of their own, drawing some criticism from those who reckon that base R functionality covers a sufficient gamut anyway. Nevertheless, there is so much here that will get you going with data wrangling and visualisation that it is worth knowing what is possible. The complaints may come from your not needing to use anything else for these purposes.

{plumber}

This R package enables developers to convert existing R functions into web API endpoints by adding roxygen2-like comment annotations to their code. Once annotated, functions can handle HTTP GET and POST requests, accept query string or JSON parameters and return outputs such as plain values or rendered plots. The package is available on CRAN as a stable release, with a development version hosted on GitHub. For deployment, it integrates with DigitalOcean through a companion package called {plumberDeploy}, and also supports Posit Connect, PM2 and Docker as hosting options. Related projects in the same space include OpenCPU, which is designed for hosting R APIs in scientific research contexts, and the now-discontinued jug package, which took a more programmatic approach to API construction.

Data Preparation

You simply cannot avoid working with data during any analysis or reporting work. While there is a learning curve if you are used to other languages, there is little doubt that R is well-endowed when it comes to performing these tasks. Here are some packages that extend base R capabilities and might even add some extra user-friendliness along the way.

{forcats}

The {forcats} package in R provides functions to manage categorical variables by reordering factor levels, collapsing infrequent values and adjusting their sequence based on frequency or other variables. It includes tools such as reordering by another variable, grouping rare categories into 'other' and modifying level order manually, which are useful for data analysis and visualisation workflows. Designed as part of the tidyverse, it integrates with other packages to streamline tasks like counting and plotting categorical data, enhancing clarity and efficiency in handling factors within R.

{tidyr}

Around this time last year, I remember completing a LinkedIn course on a set of good practices known as tidy data, where each variable occupies a column, each observation a row and each value a single cell. This package is designed to help users restructure data so it follows those rules. It provides tools for reshaping data between long and wide formats, handling nested lists, splitting or combining columns, managing missing values and layering or flattening grouped data.

Installation options include the {tidyverse} collection, standalone installation, or the development version from GitHub. The package succeeds earlier reshaping tools like {reshape2} and {reshape}, offering a focused approach to tidying data rather than general reshaping or aggregation.

{haven}

Having a long track record of working with SAS, {haven} with its abilities to read and write data files from statistical software such as SAS, SPSS and Stata, leveraging the ReadStat library, arouses my interest. Handily, it supports a range of file formats, including SAS transport and data files, SPSS system and older portable files and Stata data files up to version 15, converting these into tibbles with enhanced printing capabilities. Value labels are preserved as a labelled class, allowing conversion to factors, while dates and times are transformed into standard R classes.

{RMariaDB}

While there are other approaches to working with databases using R, {RMariaDB} provides a database interface and driver for MariaDB, designed to fully comply with the DBI specification and serve as a replacement for the older {RMySQL} package. It supports connecting to databases using configuration files, executing queries, reading and writing data tables and managing results in chunks. Installation options include binary packages from CRAN or development versions from GitHub, with additional dependencies such as MariaDB Connector/C or libmysqlclient required for Linux and macOS systems. Configuration is typically handled through a MariaDB-specific file, and the package includes acknowledgments for contributions from various developers and organisations.

COVID-19 Data Hub

For many people, the pandemic may be a fading memory, yet it offered its chances for learning R, not least because there was a use case with more than a hint of personal interest about it. Here is a library making it easier to get hold of the data, with some added pre-processing too. Memories of how I needed to wrangle what was published by various sources make me appreciate just how vital it is to have harmonised data for analysis work.

Table Production

While many appear to graphical presentation of results to their tabular display, R does have its options here too. In recent times, the options have improved, particularly of the pharmaverse initiative. Here is a selection of what I found during my explorations.

{officer}

Part of the {officeverse} along with {officedown}, {Flextable}, {Rvg} and {mschart}, the {officer} R package enables users to create and modify Word and PowerPoint documents directly from R, allowing the insertion of images, tables and formatted content, as well as the import of document content into data frames. It supports the generation of RTF files and integrates with other packages for advanced features such as vector graphics and native office charts. Installation options include CRAN and GitHub, with community resources available for assistance and contributions. The package facilitates the manipulation of document elements like paragraphs, tables and section breaks and provides tools for exporting and importing content between R and office formats, alongside functions for managing slide layouts and embedded objects in presentations.

{pharmaRTF}

If you work in clinical research like I do, the need to produce data tabulations is a non-negotiable requirement. That is how this package came to be developed and the pharmaverse of which it is part has numerous other options, should you need to look at using one of those. The flavour of RTF produced here is the Microsoft Word variety, which did not look as well in LibreOffice Writer when I last looked at the results with that open-source alternative. Otherwise, the results look well to many eyes.

{formattable}

Here is a package that enhances data presentation by applying customisable formatting to vectors and data frames, supporting formats such as percentages, currency and accounting. Available on GitHub and CRAN, it integrates with dynamic document tools like {knitr} and {rmarkdown} to produce visually distinct tables, with features including gradient colour scales, conditional styling and icon-based representations. It automatically converts to {htmlwidgets} in interactive environments and is licensed under MIT, enabling flexible use in both static and interactive data displays.

{reactable}

The {reactable} package for R provides interactive data tables built on the React Table library, offering features such as sorting, filtering, pagination, grouping with aggregation, virtual scrolling for large datasets and support for custom rendering through R or JavaScript. It integrates seamlessly into R Markdown documents and Shiny applications, enabling the use of HTML widgets and conditional styling. Installation options include CRAN and GitHub, with examples demonstrating its application across various datasets and scenarios. The package supports major web browsers and is licensed under MIT, designed for developers seeking dynamic data presentation tools within the R ecosystem.

{DT}

Particularly useful in dynamic web applications like Shiny, the {DT} package in R provides a means of rendering interactive HTML tables by building on the DataTables JavaScript library. It supports features including sorting, searching, pagination and advanced filtering, with numeric, date and time columns using range-based sliders whilst factor and character columns rely on search boxes or dropdowns. Filtering operates on the client side by default, though server-side processing is also available. JavaScript callbacks can be injected after initialisation to manipulate table behaviour, such as enabling automatic page navigation or adding child rows to display additional detail. HTML content is escaped by default as a safeguard against cross-site scripting attacks, with the option to adjust this on a per-column basis. Whilst the package integrates with Shiny applications, attention is needed around scrolling and slider positioning to prevent layout problems. Overall, the package is well suited to exploratory data analysis and the building of interactive dashboards.

{gt}

The {gt} package in R enables users to create well-structured tables with a variety of formatting options, starting from data frames or tibbles and incorporating elements such as headers, footers and customised column labels. It supports output in HTML, LaTeX and RTF formats and includes example datasets for experimentation. The package prioritises simplicity for common tasks while offering advanced functions for detailed customisation, with installation available via CRAN or GitHub. Users can access resources like documentation, community forums and example projects to explore its capabilities, and it is supported by a range of related packages that extend its functionality.

{gtsummary}

Enabling users to produce publication-ready outputs with minimal code, the {gtsummary} package offers a streamlined approach to generating analytical and summary tables in R. It automates the summarisation of data frames, regression models and other datasets, identifying variable types and calculating relevant statistics, including measures of data incompleteness. Customisation options allow for formatting, merging and styling tables to suit specific needs, while integration with packages such as {broom} and {gt} facilitates seamless incorporation into R Markdown workflows. The package supports the creation of side-by-side regression tables and provides tools for exporting results as images, HTML, Word, or LaTeX files, enhancing flexibility for reporting and sharing findings.

{huxtable}

Here is an R package designed to generate LaTeX and HTML tables with a modern, user-friendly interface, offering extensive control over styling, formatting, alignment and layout. It supports features such as custom borders, padding, background colours and cell spanning across rows or columns, with tables modifiable using standard R subsetting or dplyr functions. Examples demonstrate its use for creating simple tables, applying conditional formatting and producing regression output with statistical details. The package also facilitates quick export to formats like PDF, DOCX, HTML and XLSX. Installation options include CRAN, R-Universe and GitHub, while the name reflects its origins as an enhanced version of the {xtable} package. The logo was generated using the package itself, and the background design draws inspiration from Piet Mondrian’s artwork.

Figure Generation

R has such a reputation for graphical presentations that it is cited as a strong reason to explore what the ecosystem has to offer. While base R itself is not shabby when it comes to creating graphs and charts, these packages will extend things by quite a way. In fact, the first on this list is near enough pervasive.

{ggplot2}

Though its default formatting does not appeal to me, the myriad of options makes this a very flexible tool, albeit at the expense of some code verbosity. Multi-panel plots are not among its strengths, which may send you elsewhere for that need.

{ggforce}

Focusing on features not included in the core library, the {ggforce} package extends {ggplot2} by offering additional tools to enhance data visualisation. Designed to complement the primary role of {ggplot2} in exploratory data analysis, it provides a range of geoms, stats and other components that are well-documented and implemented, aiming to support more complex and custom plot compositions. Available for installation via CRAN or GitHub, the package includes a variety of functionalities described in detail on its associated website, though specific examples are not included here.

{cowplot}

Developed by Claus O. Wilke for internal use in his lab, {cowplot} is an R package designed to help with the creation of publication-quality figures built on top of {ggplot2}. It provides a set of themes, tools for aligning and arranging plots into compound figures and functions for annotating plots or combining them with images. The package can be installed directly from CRAN or as a development version via GitHub, and it has seen widespread use in the book Fundamentals of Data Visualisation.

{sjPlot}

The {sjPlot} package provides a range of tools for visualising data and statistical results commonly used in social science research, including frequency tables, histograms, box plots, regression models, mixed effects models, PCA, correlation matrices and cluster analyses. It supports installation via CRAN for stable releases or through GitHub for development versions, with documentation and examples available online. The package is licensed under GPL-3 and developed by Daniel Lüdecke, offering functions to create visualisations such as scatter plots, Likert scales and interaction effect plots, along with tools for constructing index variables and presenting statistical outputs in tabular formats.

{thematic}

By offering a centralised approach to theming and enabling automatic adaptation of plot styles within Shiny applications, the {thematic} package simplifies the styling of R graphics, including {ggplot2}, {lattice} and base R plots, R Markdown documents and RStudio. It allows users to apply consistent visual themes across different plotting systems, with auto-theming in Shiny and R Markdown relying on CSS and {bslib} themes, respectively. Installation requires specific versions of dependent packages such as {shiny} and {rmarkdown}, while custom fonts benefit from {showtext} or {ragg}. Users can set global defaults for background, foreground and accent colours, as well as fonts, which can be overridden with plot-specific theme adjustments. The package also defines default colour scales for qualitative and sequential data and integrates with tools like bslib to import Google Fonts, enhancing visual consistency across different environments and user interfaces.

Publishing Tools

The R ecosystem goes beyond mere graphical and tabular display production to offer means for taking things much further, often offering platforms for publishing your work. These can be used locally too, so there is no need to entrust everything to a third-party provider. The uses are endless for what is available, and it appears that Posit has used this to help with building documentation and training too.

R Markdown

What you have here is one of those distinguishing facilities of the R ecosystem, particularly for those wanting to share their analysis work with more than a hint of reproducibility. The tool combines narrative text and code to generate various outputs, supporting multiple programming languages and formats such as HTML, PDF and dashboards. It enables users to produce reports, presentations and interactive applications, with options for publishing and scheduling through platforms like RStudio Connect, facilitating collaboration and distribution of results in professional settings.

Distill for R Markdown

Distill for R Markdown is a tool designed to streamline the creation of technical documents, offering features such as code folding, syntax highlighting and theming. It builds on existing frameworks like Pandoc, MathJax and D3, enabling the production of dynamic, interactive content. Users can customise the appearance with CSS and incorporate appendices for supplementary information. The tool acknowledges the contributions of developers who created foundational libraries, ensuring accessibility and functionality for a wide audience. Its design prioritises clarity, allowing authors to focus on presenting results rather than underlying code, while maintaining flexibility for those who wish to include detailed explanations.

{shiny}

For a while, this was one of R's unique selling points, and remains as compelling a reason to use the language even when Python has got its own version of the package. Enabling the creation of interactive web applications for data analysis without requiring web development expertise allows users to build interfaces that let others explore data through dynamic visualisations and filters. Here is a simple example: an app that generates scatter plots with adjustable variables, species filters and marginal plots, hosted either on personal servers or through a dedicated hosting service.

{bslib}

The {bslib} R package offers a modern user interface toolkit for Shiny and R Markdown applications, leveraging Bootstrap to enable the creation of customisable dashboards and interactive theming. It supports the use of updated Bootstrap and Bootswatch versions while maintaining compatibility with existing defaults, and provides tools for real-time visual adjustments. Installation is available through CRAN, with example previews demonstrating its capabilities.

{rhandsontable}

Enabling users to manipulate and validate data within a spreadsheet-like interface, the {rhandsontable} package introduces an interactive data grid for R. It supports features such as custom cell rendering, validation rules and integration with Shiny applications. When used in Shiny, the widget requires explicit conversion of data using the hot_to_r function, as updates may not be immediately reflected in reactive contexts. Examples demonstrate its application in various scenarios, including date editing, financial calculations and dynamic visualisations linked to charts. The package also accommodates bookmarks in Shiny apps with specific handling. Users are encouraged to report issues or contribute improvements, with guidance provided for those seeking to expand its functionality. The development team welcomes feedback to refine the tool further, ensuring it aligns with evolving user needs.

{xaringanExtra}

{xaringanExtra} offers a range of enhancements and extensions for creating and presenting slides with xaringan, enabling features such as adding an overview tile view, making slides editable, broadcasting in real time, incorporating animations, embedding live video feeds and applying custom styles. It allows users to selectively activate individual tools or load multiple features simultaneously through a single function call, supporting tasks like adding banners, enabling code copying, fitting slides to screen dimensions and integrating utility toolkits. The package is available for installation via CRAN or GitHub, providing flexibility for developers and presenters seeking to expand the functionality of their slides.

Moving from 32-Bit to 64-Bit SAS on Windows

15^th March 2026

Moving from 32-bit SAS on Microsoft Windows to a 64-bit environment can look deceptively straightforward from the outside. The operating system is still Windows, programmes often run without alteration, and many data sets open just as expected. Beneath that continuity, however, sit several technical differences that matter considerably in practice, especially for organisations with long-lived code, established format libraries and regular exchanges with Microsoft Office files.

What makes this transition particularly awkward is that SAS treats some of these changes as more than a simple in-place upgrade. As Jacques Thibault notes in his PharmaSUG 2012 paper, a new operating system will often be accompanied by a new version of surrounding applications, and what matters most is ensuring sufficient time and resources to fully test existing programmes under the new environment before committing to the change. SAS file types are not uniformly portable across the 32-bit to 64-bit boundary, and support behaviour also differs by SAS release, with SAS 9.3 marking the point at which some earlier friction was meaningfully reduced. As of 2025, the current release of the SAS 9 line is SAS 9.4 Maintenance 9 (M9), and organisations running any SAS 9.4 release benefit from the data-set interoperability improvements first introduced in SAS 9.3, whilst the catalog and Office-integration issues described in this article remain relevant across all SAS 9.x environments.

Data Sets and Catalogs: A Fundamental Distinction

The broadest distinction is between SAS data sets and SAS catalogs. Data sets are generally more forgiving, while catalogs are not. SAS Usage Note 38339 explains that when upgrading from 32-bit to 64-bit Windows SAS in releases earlier than SAS 9.3, Cross-Environment Data Access (CEDA) is invoked to access 32-bit SAS data sets. CEDA allows the file to be read without immediate conversion, though it can impose restrictions and may reduce performance. The same note states directly that 64-bit SAS provides no access to 32-bit catalogs at all.

That distinction sits at the centre of most migration problems, and it is the reason a move that feels routine can catch teams off guard when they first encounter the ERROR: CATALOG was created for a different operating system message. As Chris Hemedinger explains in a post on The SAS Dummy, the move from 32-bit SAS for Windows to 64-bit SAS for Windows is, for all intents and purposes, a platform change from SAS's perspective, even though only the bit architecture has changed, and SAS catalogs are not portable across platforms.

How SAS Handles Data Sets Across the Boundary

For data sets, the picture is comparatively manageable. If a 32-bit SAS data set is opened in a 64-bit SAS session in releases before SAS 9.3, SAS writes a note to the log stating that the file is native to another host or that its encoding differs from the current session encoding, and that Cross-Environment Data Access will be used, which might require additional CPU resources and might reduce performance. This is SAS performing translation work in the background, and whilst useful for continued access, it is not always ideal for regular production use.

There is an important nuance that changes things significantly with SAS 9.3. In 32-bit SAS on Windows, the data representation is WINDOWS_32, whilst in 64-bit SAS on Windows it is WINDOWS_64. Hemedinger notes that in SAS 9.3 the developers taught SAS for Windows to bypass the CEDA layer when the only encoding difference is WINDOWS_32 versus WINDOWS_64. SAS Knowledge Base article 38379 confirms this, stating that from SAS 9.3 onwards, Windows 32-bit data sets can be read, written and updated in Windows 64-bit SAS, and vice versa, as a result of a change in how SAS determines file compatibility at open time. Users on SAS 9.3 and later, including all SAS 9.4 maintenance releases, may therefore see fewer warnings and less friction with ordinary data sets originating in 32-bit Windows SAS.

Converting Data Sets to Native 64-Bit Format

Even with those SAS 9.3 improvements, many organisations prefer to convert files into the native 64-bit format rather than rely indefinitely on cross-environment access. For entire libraries, PROC MIGRATE is the recommended mechanism. SAS Usage Note 38339 notes that for releases preceding SAS 9.3, PROC MIGRATE can migrate 32-bit SAS data sets to 64-bit, changing their format so that CEDA is no longer required.

The advantages of PROC MIGRATE over the older conversion procedures are set out in detail by Diane Olson and David Wiehle of SAS Institute in their paper hosted by the University of Delaware. Unlike PROC COPY, PROC MIGRATE retains deleted observations, migrates audit trails, preserves all integrity constraints and automatically retains created and last-modified date/times, compression, encryption, indexes and passwords from the source library. It is designed to produce members in the target library that differ from the source only in being in the new SAS format.

When the task concerns individual SAS data files rather than a whole library, SAS Usage Note 38339 points to PROC COPY with the NOCLONE option. Used in a 64-bit SAS session, this copies a 32-bit Windows data set into a new file that is native to the 64-bit environment. The NOCLONE option prevents SAS from cloning the original data representation during the copy, so that the resulting file is written in the target environment's native format and CEDA is no longer needed to process it. Thibault's PharmaSUG paper illustrates this with an example using PROC COPY with the NOCLONE option together with an OUTREP setting on the target LIBNAME statement to force creation in the desired representation.

Catalogs: The Hard Problem

Catalogs are a different matter entirely. If a user running 64-bit SAS attempts to open a catalog created in a 32-bit SAS session, the familiar error appears: ERROR: CATALOG was created for a different operating system. In the case of format catalogs, a related message often reads ERROR: File LIBRARY.FORMATS.CATALOG was created for a different operating system, and this is frequently followed by failures to use user-defined formats attached to variables. As the SSCC guidance from the University of Wisconsin-Madison notes, this can prevent 64-bit SAS from reading the data set at all, with the error about formats recorded only in the log whilst the visible symptom is simply that the table did not open.

This matters because catalogs are machine-dependent. User-defined formats created by PROC FORMAT are usually stored in catalogs, often in a member named FORMATS. If those formats were built in 32-bit SAS, 64-bit SAS cannot use the catalog directly, and this affects not only explicit formatting in code but also routine data viewing because a data set linked to permanent user-defined formats may fail to display properly unless the associated format catalog is converted.

Options for Migrating Format Catalogs

There are several ways to address catalog incompatibility. If the original PROC FORMAT source code still exists, the cleanest option is simply to rerun it under 64-bit SAS, producing a fresh native catalog. The SSCC guidance treats this as the easiest solution that preserves the formats themselves, and it also describes a short-term workaround: adding a bare format female; statement to the DATA or PROC step, which removes the custom format from that variable, so there is no need to read the problem catalog file at all.

When source code is not available, transport-based conversion is the answer. In a 32-bit SAS session, PROC CPORT creates a transport file from the catalog library, and in a 64-bit SAS session, PROC CIMPORT recreates the catalog in the new environment. SAS Knowledge Base article KB0041614 provides sample code that creates a transport file in 32-bit SAS using proc cport lib=my32 file=trans memtype=catalog; select formats; and then unloads it in 64-bit SAS using PROC CIMPORT, after which a new Formats.sas7bcat file should be present in the target library. The same article notes that if access to a 32-bit SAS session is simply not available, the system option NOFMTERR can be submitted as a last resort: this allows the underlying data values to be displayed whilst user-defined formats are ignored, avoiding the error without converting the catalog.

A more robust route for user-defined formats is to avoid moving the catalog as a catalog at all. PROC FORMAT can write format definitions to a standard SAS data set using CNTLOUT, and later rebuild them from that data set using CNTLIN. Because SAS data sets are generally portable across the 32-bit to 64-bit boundary, this method sidesteps the catalog incompatibility directly. KB0041614 describes CNTLOUT/CNTLIN as the most robust method available for migrating user-defined format libraries. Karin LaPann, writing in a poster presented at a meeting of the Philadelphia Area SAS Users Group, reaches the same conclusion and recommends always creating data sets from format catalogs and storing them alongside the data in the same library as a matter of good practice.

Caveats: Item Stores, Compiled Macros and the PIPE Engine

SAS Usage Note 38339 explicitly states that stored compiled macro catalogs are not supported by PROC CPORT and must be recompiled in the new operating environment, with SAS Note 46846 covering compatibility guidance for those files specifically. The note also warns that the 32-bit version of SAS should not be removed until it can be verified that all 32-bit catalogs have been successfully migrated.

Thibault's PharmaSUG paper identifies two further file types that require attention. SAS Item Store files (.sas7bitm), which organisations may use to store standard PROC TEMPLATE output templates, are not compatible across 32-bit and 64-bit environments, and the practical solution is to recreate them under the new environment using the same programme that created them originally, targeting a different output directory to avoid a mixed 32-bit and 64-bit directory. Thibault also notes that programmes using the PIPE engine may produce errors on Windows 64-bit environments, and recommends replacing such code with newer SAS functions such as filename, dopen and dread to avoid the issue altogether. These are not universal blockers, but they underline why testing is essential rather than assumed.

Microsoft Office Integration After the Move

Another area where 64-bit moves catch users out is access to Microsoft Excel and Access files. The issue is not SAS data compatibility but the bit-ness of the Microsoft data providers. In 64-bit SAS for Windows, attempts to use PROC IMPORT with DBMS=EXCEL, PROC EXPORT with Excel or Access options, or LIBNAME EXCEL can fail with errors such as ERROR: Connect: Class not registered or Connection Failed. As Hemedinger explains, the cause is that the 64-bit SAS process cannot use the built-in data providers for Microsoft Excel or Microsoft Access, which are usually 32-bit modules. Thibault's paper confirms that installation of the PC Files Server on the same machine will be required, since the required 32-bit ODBC drivers are incompatible with 64-bit SAS on Windows.

The workarounds depend on the file type and local setup. SAS/ACCESS to PC Files provides methods such as DBMS=EXCELCS, DBMS=ACCESSCS and LIBNAME PCFILES, all of which use the PC Files Server as an intermediary, with an autostart feature that minimises configuration changes to existing SAS programmes. For .xlsx files, DBMS=XLSX removes the Microsoft data providers from the equation entirely and requires no additional setup from SAS 9.3 Maintenance 1 onwards. Installing 64-bit Microsoft Office may appear to solve the bit-ness mismatch by supplying 64-bit providers, but as Hemedinger cautions, Microsoft recommends the 64-bit version of Office in only a few circumstances, and that route can introduce other incompatibilities with how Office applications are used.

Identifying 32-Bit Catalogs in a Mixed Environment

In mixed environments, a practical challenge is identifying which catalogs are still 32-bit and which are already 64-bit. This was precisely the problem Michael Raithel posed on LinkedIn in March 2015, after finding that no SAS facility, whether PROC CATALOG, PROC CONTENTS, PROC DATASETS or the Dictionary Tables, provided a direct way to distinguish them. His solution treats the .sas7bcat file as a flat file rather than a catalog, reading the first record and searching for the character strings W32_7PRO (identifying a 32-bit catalog) and X64_7PRO (identifying a 64-bit catalog). The macro he developed can be run against any number of catalogs and builds a SAS data set recording the bit-ness and full path of each file, making large-scale inventory automation entirely practical during a phased transition.

For broader validation work, the Olson and Wiehle paper pairs PROC MIGRATE with macros based on PROC CONTENTS, PROC DATASETS and PROC COMPARE, documenting what existed in the source library before migration and verifying what exists in the target library afterwards. For highly regulated or large-scale environments, that kind of structured checking is not optional.

Navigating the Transition Without Unnecessary Disruption

The main lesson from all of this is that moving from 32-bit to 64-bit SAS on Windows is not simply a matter of reinstalling software and carrying on unchanged. Much will work as before, particularly with ordinary data sets and particularly in SAS 9.3 and later. Catalogs, format libraries, item stores and Microsoft Office integration, however, require deliberate attention.

The transition is not so much problematic as predictable. Keeping 32-bit SAS available until catalog migration is confirmed, using PROC MIGRATE for full libraries, using PROC COPY with NOCLONE for individual data sets, converting format catalogs via CPORT/CIMPORT or CNTLOUT/CNTLIN, recreating item stores and compiled macros in the new environment and testing Office-related workflows and PIPE based code before deployment together form a sound path through the process. With that preparation in place, the advantages of a 64-bit environment can be gained without avoidable disruption.

Modernising SAS: The 4GL Apps and SASjs Ecosystem

31^st January 2026

Custom interfaces to the world's most powerful analytics platform are no longer a niche concern. In many organisations, SAS remains central to reporting, modelling and operational decision-making, yet the way users interact with that capability can vary widely. Some teams still rely on desktop applications, batch processes, shared drives and manual interventions, while others are moving towards web-based interfaces, stronger governance and a more modern development workflow. The material at sasapps.io points to an ecosystem built around precisely that transition, blending long-standing SAS expertise with open-source tooling and documented delivery methods.

The Company Behind the Ecosystem

At the centre of this transition is 4GL Apps. The company's positioning is straightforward: help organisations leverage their SAS investment through services, solutions and products that fit specific needs. Rather than replacing SAS, the aim is to extend it with custom interfaces and delivery approaches that are maintainable, transparent and based on standard frameworks. An emphasis on documentation appears throughout the site, suggesting that projects are intended either for handover to internal teams or for ongoing support under clearly defined packages.

That proposition matters because many SAS environments have grown over years, sometimes decades. In such settings, technical capability is rarely the issue. The challenge is more often how to expose that capability in ways that are usable, secure and sustainable. A powerful analytics platform can still be hampered by awkward user journeys, brittle desktop tooling or resource-heavy support arrangements, and the 4GL Apps model tries to address those practical concerns without discarding existing SAS infrastructure.

Services

The service offering gives a useful sense of how this approach is organised. One strand is SAS App Delivery, framed not merely as building applications, but also as building tools that make SAS app development faster. That detail points to an emphasis on repeatability rather than one-off implementation. Another strand is SAS App Support, aimed at organisations with existing SAS-powered applications but insufficient internal resource to keep them running. Fixed-price plans are offered to keep those interfaces active, which implies an attempt to make operational costs more predictable. A third service area is SASjs Enhancement, where new features can be added to SASjs at a discounted rate to support particular use cases.

Solutions

These services sit alongside a broader set of solutions. One is the creation of SAS-powered HTML5 applications, described as bespoke builds tailored to specific workflow and reporting requirements, using fully open-source tools, standard frameworks and full documentation. Clients are given a practical choice: maintain the application in-house or use a transparent support package. Another solution addresses end-user computing risk through data capture and control. Here, the approach enables business users to self-load VBA-driven Excel reporting tools into a preferred database while applying data quality checks at source, a four-eyes (or more) approval step at each stage and full audit traceability back to the original EUC artefact. A further solution is the modernisation of legacy AF/SCL desktop applications, with direct migration to SAS 9 or Viya in order to improve user experience, security and scalability while moving to a modern SAS stack supported by open-source technology.

That last area reveals a theme running through the whole ecosystem: modernisation does not necessarily mean abandoning what exists. In many SAS estates, AF/SCL applications remain deeply embedded in business processes, and replacing them outright can be costly and risky, especially when they encode years of operational logic. A migration path that preserves business function while improving maintainability and interface design will naturally appeal to teams that need progress without disruption.

Products

The product range fills out the picture further. Data Controller for SAS enables business users to make controlled changes to data in SAS. The SASjs Framework is a collection of open-source tools to accelerate SAS DevOps and the development of SAS-powered web applications. There is also an AF/SCL Kit, migration tooling for the rapid modernisation of monolithic AF/SCL applications. Together, these products form a stack covering interface delivery, governed data change and development workflow, and they suggest that the company's work is not limited to consultancy but includes reusable software assets with their own documentation and source code.

Data Controller: Governance and Audit

Data Controller receives the richest functional description in the ecosystem's documentation. It is intended for business owners in regulatory reporting environments and, more broadly, for any enterprise that needs to perform manual data uploads with validation, approval, security and control. The rationale is rooted in familiar SAS working practices. Users may place files on network drives for batch loading, update data directly using SAS code, open a dataset in Enterprise Guide and change a value, or ask a database administrator to run a script update. According to the product's own documentation, those approaches are less than ideal: every new piece of data may require a new programme, end users may need to have `modify` access to sensitive data locations, datasets can become locked, and change requests can slow the process.

Data Controller is presented as a response to those weaknesses. The goal is described as focusing on great user experience and auditor satisfaction, while saving years of development and testing compared with a custom-built alternative. It is a SAS-powered web application with real-time capabilities, where intraday concurrent updates are managed using a lock table and queuing mechanism. Updates are aborted if another user has changed the table since the approval difference was generated, which helps preserve consistency in multi-user environments. Authentication and authorisation rely on the existing SASLogon framework, and end users do not require direct access to the target tables.

The governance model is equally central. All data changes require one or more approvals before a table is updated, and the approver sees only the changes that will be applied to the target, including new, deleted and changed rows. The system supports loading tables of different types through SAS libname engines, with support for retained keys, SCD2 loads, bitemporal data and composite primary keys. Full audit history is a prominent feature: users can track every change to data, including who made it, when it was made, why it was made and what the actual change was, all accessible through a History page.

A particularly notable feature is that onboarding new tables requires zero code. Adding a table is a matter of configuration performed within the tool itself, without the need to define column types or lengths manually, as these are determined dynamically at runtime. Workflow extensibility is built in through configurable hook scripts that execute before and after each action, with examples such as running a data quality check after uploading a mapping table or running a model after changing a parameter. Taken together, those features position Data Controller less as a narrow upload utility and more as a governed operational layer for business-managed data change.

The application was designed to work on multiple devices and different screen types, combined with SAS scalability and security to provide flexibility and location independence when managing data. This suggests it is intended for practical day-to-day use by business teams rather than solely by technical specialists at a desktop workstation.

SASjs: DevOps for SAS

Underpinning much of the ecosystem is SASjs, described on its GitHub organisation page as "DevOps for SAS." It is designed to accelerate the development and deployment of solutions on all flavours of SAS, including Viya, EBI and Base. Everything in SASjs is MIT open-source and free for commercial use. The framework also explicitly underpins Data Controller for SAS, which connects the product and framework strands of the wider ecosystem. The GitHub organisation page notes that the SASjs project and its repositories are not affiliated with SAS Institute.

The resources page at sasjs.io lists the key GitHub repositories: the Macro Core library, the SASjs adapter for bidirectional SAS and JavaScript communication, the SASjs CLI, a minimal seed application and seed applications for React and Angular. Documentation sites cover the adapter, CLI, Macro Core library, SASjs Server and Data Controller. Useful external links from the same resources page include guides to building and deploying web applications with the SASjs CLI, scaffolding SAS projects with NPM and SASjs, extending Angular web applications on Viya and building a vanilla JavaScript application on SAS 9 or Viya. There is also mention of a Viya log parser, training resources, guides, FAQs and a glossary, pointing to an effort to support both implementation and adoption.

The SASjs CLI

The command-line tooling, documented at cli.sasjs.io, gives a clearer view of how SASjs approaches DevOps. The CLI is described as a Swiss-army knife with a flexible set of options and utilities for DevOps on SAS Viya, SAS 9 EBI and SASjs Server. Its core functions include creating a SAS Git repository in an opinionated way, compiling each service with all dependent macros, macro variables and pre- or post-code, building the master SAS deployment, deploying through local scripts and remote SAS programmes, running unit tests with coverage and generating a Doxygen documentation site with data lineage, homepage and project logo from the configuration file. There is also a feature for deploying a frontend as a streaming application, bypassing the need to access the SAS web server directly.

The full command set covers the project lifecycle. The CLI can add and authenticate targets, compile and build projects, deploy them to a SAS server location, generate documentation and manage contexts, folders and files. It can execute jobs, run arbitrary SAS code from the terminal, deploy a service pack and generate a snippets file for macro autocompletion in VS Code. It can also lint SAS code to identify common problems and run unit tests while collecting results in JSON or CSV format, together with logs. In effect, this brings SAS development considerably closer to the workflows commonly seen in mainstream software engineering, which may be especially valuable in organisations trying to standardise delivery practices across mixed technology estates.

Presentations and the Wider SAS Community

The slides.sasjs.io collection adds another dimension by showing that these ideas have been presented in conference and user group settings. Available decks cover DevOps for MSUG, SUGG and WUSS, SASjs for application development, SASjs Server, AF and AF/SCL modernisation, SASjs for PHUSE, testing and a legacy SAS apps presentation for FANS in January 2023. While slide decks alone do not prove adoption or outcomes, they do show a sustained effort to communicate methods and patterns to the broader SAS community, consistent with the open documentation and MIT licensing found throughout the ecosystem.

Building a Modern Layer Around an Established Platform

The most useful way to understand this ecosystem is not as a single product but as a layered approach. At one level, there are services for building and supporting applications. At another, there are packaged tools such as Data Controller and the AF/SCL Kit. Underneath both sits SASjs, providing open-source components and delivery practices intended to make SAS development more structured and scalable. The combination of bespoke SAS-powered HTML5 applications, governed data update tooling, AF/SCL migration support and open-source DevOps utilities points to a coherent effort to modernise how SAS is delivered and used, without severing ties to established platforms. SAS remains the analytical engine, but the interfaces, workflows and operational controls around it are updated to reflect current expectations in web application design, governance and DevOps practice.

Launching SAS Analytics Pro on Viya with automated Docker image clean-up

28^th November 2025

For my freelancing, I have a licensed version of SAS Analytics Pro running in a Docker container on my main Linux workstation. Every time there is a new release, a new Docker is made available, which means that a few of them could accumulate on your system. Aside from taking up disk space that could have other uses, it also makes it tricky to automate the startup of the associate Docker container. Avoiding this means pruning the Docker images available on the system, something that also needs automation.

To make things clearer, let me work through the launch script that I use; this is called by the script that checks for and then downloads any new image that is available, should that be needed. First up is the shebang, and this uses the -e switch to exit the script in the event of there being an error. That puts a stop to any potentially destructive outcomes from later commands being executed afterwards and without having the input that they need.

#!/bin/bash -e

Next comes the command to shut down the existing container. Should a new image get instated, this would lock up the old one, preventing its removal. Also, doing the rest of the steps with an already running container will result in errors anyway.

if docker container ls -a --format '{{.Names}}' | grep -q '^sas-analytics-pro$'; then
    docker container stop sas-analytics-pro
fi

After that, the step to find the latest image is performed. Once, I did this by looping through the ages by days, weeks and months, hardly an elegant or robust approach. What follows is something all the more effective.

# Find latest SAS Analytics Pro image
IMAGE=$(docker image ls --format '{{.Repository}}:{{.Tag}} {{.CreatedAt}}' \
    | grep 'sas-analytics-pro' \
    | sort -k2,3r \
    | head -n 1 \
    | awk '{print $1}')

echo "Chosen image: $IMAGE"

Since there is quite a lot happening above, let us unpack the actions. The first part lists all Docker images, formatting each line to show the image name (repository:tag) followed by its creation timestamp: docker image ls --format '{{.Repository}}:{{.Tag}} {{.CreatedAt}}'. The next piece picks out all the images that are for SAS Analytics Pro: grep 'sas-analytics-pro'. The crucial step, sort -k2,3r, comes next and this sorts the results by the second and third fields (the creation date and time) in reverse order, so the newest images appear first. With that done, it is time to pick out the most recent image using head -n 1. To pick out the image name, you need awk '{print $1}. This wrapped within IMAGE=$(...) to assign the result to a variable that is printed to the console using an echo statement.

With the image selected, you can then spin up the container once you specify the other parameters to use and allow some sleep time afterwards before proceeding to the clean-up steps:

run_args="
-e SASLOCKDOWN=0
--name=sas-analytics-pro
--rm
--detach
--hostname sas-analytics-pro
--env RUN_MODE=developer
--env SASLICENSEFILE=[Path to SAS licence file]
--publish 8080:80
--volume ${PWD}/sasinside:/sasinside
--volume ${PWD}/sasdemo:/data2
--volume [location of SAS files on the system]:/data
--cap-add AUDIT_WRITE
--cap-add SYS_ADMIN
--publish 8222:22
"

if ! docker run -u root ${run_args} "$IMAGE" "$@" > /dev/null 2>&1; then
    echo "Failed to run the image."
    exit 1
fi

sleep 5

With the new container in action, the subsequent step is to find the older images and delete those. Again, the docker image command is invoked, with its output fed to a selection command for SAS Analytics Pro images. Once the current image has been removed from the listing by the grep -v command, the list of images to be deleted is assigned to the IMAGES_TO_REMOVE variable.

IMAGES_TO_REMOVE=$(docker image ls --format '{{.Repository}}:{{.Tag}}' \
    | grep 'sas-analytics-pro' \
    | grep -v "^$IMAGE$")

echo "Will remove older images:"
echo "$IMAGES_TO_REMOVE"

After that has happened, iterating through the list of images using a for loop will remove them one at a time using the docker image rm command:

for OLD in $IMAGES_TO_REMOVE; do
    echo "Removing $OLD"
    docker image rm "$OLD" || echo "Could not remove $OLD"
done

All this concludes the operation of spinning up a new SAS Analytics Pro Docker container while also removing any superseded Docker images. One last step is to capture the password to use for logging into the SAS Studio interface that is available at localhost:8080 or whatever address and port is being used to serve the application:

docker logs sas-analytics-pro 2>&1 | grep "Password=" > pw.txt

Folding updating and housekeeping into the same activity as spinning up the Docker container means that I need not think of doing anything else. The time taken by the other activities repay the effort by always having the latest version running in a tidy environment. That just saves having to remember to do all of this, which is what is needed without automation.

WARNING: Unsupported device 'SVG' for RTF destination. Using default device 'EMF'.

31^st October 2025

When running SAS programs to create some outputs, I met with the above message because I was sending output to an RTF file and these do not support SVG files. This must be a system default because there was nothing in the programs that should trigger the warning. Getting rid of the message uses one of two approaches:

goptions device=emf;

ods graphics / outputfmt=emf;

The first applies to legacy code generating graphics using SAS/GRAPH procedures, while the second is for the more modern ODS graphics procedures like SGPLOT. While the first one may work in all cases, that cannot be assumed without further testing.

There was another curiosity in this: the system setup should have taken care of things to stop the warning from appearing in the log. However, these were programs created from SAS logs to capture all code generated by any SAS macros that were called for regulatory submission purposes. Thus, they were self-contained and missed off the environment setup code, which is how I came to see what I did.

Platform-independent file deletion in SAS programming

11^th September 2025

There are times when you need to delete a file from within a SAS program that is not one of its inbuilt types, such as datasets or catalogues. Many choose to use X commands or the SYSTASK facility, even if these issue operating system-specific commands that obstruct portability. Also, it should be recalled that system administrators can make these unavailable to users as well. Thus, it is better to use methods that are provided within SAS itself.

The first step is to define a file reference pointing to the file that you need to delete:

filename myfile '<full path of file to be removed>';

With that accomplished, you can move on to deleting the file in a null data step, one where no output file is generated. The FDELETE function does that in the code below using the MYFILE file reference declared already. That step issues a return code used in the provision of feedback to the user, with 0 indicated success and non-zero values indicating failure.

data _null_;
  rc = fdelete('myfile');
  if rc = 0 then put 'File deleted successfully.';
  else put 'Error deleting file. RC=' rc;
run;

With the above out of the way, you then can clear the file reference to tidy up things afterwards:

filename myfile clear;

Some may consider that SAS programs are single use items that may not be moved from one system to another. However, I have been involved in several of these, particularly between Windows, UNIX and Linux, so I know the value of keeping things streamlined and some SAS programs are multi-use anyway. Anything that cuts down on workload when there is much else to do cannot be discounted.

SAS Packages: Revolutionising code sharing in the SAS ecosystem

26^th July 2025

In the world of statistical programming, SAS has long been the backbone of data analysis for countless organisations worldwide. Yet, for decades, one of the most significant challenges facing SAS practitioners has been the efficient sharing and reuse of code. Knowledge and expertise have often remained siloed within individual developers or teams, creating inefficiencies and missed opportunities for collaboration. Enter the SAS Packages Framework (SPF), a solution that changes how SAS professionals share, distribute and utilise code across their organisations and the broader community.

The Problem: Fragmented Knowledge and Complex Dependencies

Anyone who has worked extensively with SAS knows the frustration of trying to share complex macros or functions with colleagues. Traditional code sharing in SAS has been plagued by several issues:

Dependency nightmares: A single macro often relies on dozens of utility macros working behind the scenes, making it nearly impossible to share everything needed for the code to function properly
Version control chaos: Keeping track of which version of which macro works with which other components becomes an administrative burden
Platform compatibility issues: Code that works on Windows might fail on Linux systems and vice versa
Lack of documentation: Without proper documentation and help systems, even the most elegant code becomes unusable to others
Knowledge concentration: Valuable SAS expertise remains trapped within individuals rather than being shared with the broader community

These challenges have historically meant that SAS developers spend countless hours reinventing the wheel, recreating functionality that already exists elsewhere in their organisation or the wider SAS community.

The Solution: SAS Packages Framework

The SAS Packages Framework, developed by Bartosz Jabłoński, represents a paradigm shift in how SAS code is organised, shared and deployed. At its core, a SAS package is an automatically generated, single, standalone zip file containing organised and ordered code structures, extended with additional metadata and utility files. This solution addresses the fundamental challenges of SAS code sharing by providing:

Functionality over complexity: Instead of worrying about 73 utility macros working in the background, you simply share one file and tell your colleagues about the main functionality they need to use.
Complete self-containment: Everything needed for the code to function is bundled into one file, eliminating the "did I remember to include everything?" problem that has plagued SAS developers for years.
Automatic dependency management: The framework handles the loading order of code components and automatically updates system options like cmplib= and fmtsearch= for functions and formats.
Cross-platform compatibility: Packages work seamlessly across different operating systems, from Windows to Linux and UNIX environments.

Beyond Macros: A Spectrum of SAS Functionality

One of the most compelling aspects of the SAS Packages Framework is its versatility. While many code-sharing solutions focus solely on macros, SAS packages support a wide range of SAS functionality:

User-defined functions (both FCMP and CASL)
IML modules for matrix programming
PROC PROTO C routines for high-performance computing
Custom formats and informats
Libraries and datasets
PROC DS2 threads and packages
Data generation code
Additional content such as documentation PDF files

This comprehensive approach means that virtually any SAS functionality can be packaged and shared, making the framework suitable for everything from simple utility macros to complex analytical frameworks.

Real-World Applications: From Pharmaceutical Research to General Analytics

The adoption of SAS packages has been particularly notable in the pharmaceutical industry, where code quality, validation and sharing are critical concerns. The PharmaForest initiative, led by PHUSE Japan's Open-Source Technology Working Group, exemplifies how the framework is being used to revolutionise pharmaceutical SAS programming. PharmaForest offers a collaborative repository of SAS packages specifically designed for pharmaceutical applications, including:

OncoPlotter: A comprehensive package for creating figures commonly used in oncology studies
SAS FAKER: Tools for generating realistic test data while maintaining privacy
SASLogChecker: Automated log review and validation tools
rtfCreator: Streamlined RTF output generation

The initiative's philosophy captures perfectly the spirit of the SAS Packages Framework: "Through SAS packages, we want to actively encourage sharing of SAS know-how that has often stayed within individuals. By doing this, we aim to build up collective knowledge, boost productivity, ensure quality through standardisation and energise our community".

The SASPAC Archive: A Growing Ecosystem

The establishment of SASPAC (SAS Packages Archive) represents the maturation of the SAS packages ecosystem. This dedicated repository serves as the official home for SAS packages, with each package maintained as a separate repository complete with version history and documentation. Some notable packages available through SASPAC include:

BasePlus: Extends BASE SAS with functionality that many developers find themselves wishing was built into SAS itself. With 12 stars on GitHub, it's become one of the most popular packages in the archive.
MacroArray: Provides macro array functionality that simplifies complex macro programming tasks, addressing a long-standing gap in SAS's macro language capabilities.
SQLinDS: Enables SQL queries within data steps, bridging the gap between SAS's powerful data step processing and SQL's intuitive query syntax.
DFA (Dynamic Function Arrays): Offers advanced data structures that extend SAS's analytical capabilities.
GSM (Generate Secure Macros): Provides tools for protecting proprietary code while still enabling sharing and collaboration.

Getting Started: Surprisingly Simple

Despite the capabilities, getting started with SAS packages is fairly straightforward. The framework can be deployed in multiple ways, depending on your needs. For a quick test or one-time use, you can enable the framework directly from the web:

filename packages "%sysfunc(pathname(work))"; filename SPFinit url "https://raw.githubusercontent.com/yabwon/SAS_PACKAGES/main/SPF/SPFinit.sas"; %include SPFinit;

For permanent installation, you simply create a directory for your packages and install the framework:

filename packages "C:SAS_PACKAGES"; %installPackage(SPFinit)

Once installed, using packages becomes as simple as:

%installPackage(packageName) %helpPackage(packageName) %loadPackage(packageName)

Developer Benefits: Quality and Efficiency

For SAS developers, the framework offers numerous advantages that go beyond simple code sharing:

Enforced organisation: The package development process naturally encourages better code organisation and documentation practices.
Built-in testing: The framework includes testing capabilities that help ensure code quality and reliability.
Version management: Packages include metadata such as version numbers and generation timestamps, supporting modern DevOps practices.
Integrity verification: The framework provides tools to verify package authenticity and integrity, addressing security concerns in enterprise environments.
Cherry-picking: Users can load only specific components from a package, reducing memory usage and namespace pollution.

The Future of SAS Code Sharing

The growing adoption of SAS packages represents more than just a new tool, it signals a fundamental shift towards a more collaborative and efficient SAS ecosystem. The framework's MIT licensing and 100% open-source nature ensure that it remains accessible to all SAS users, from individual practitioners to large enterprise installations. This democratisation of advanced code-sharing capabilities levels the playing field and enables even small teams to benefit from enterprise-grade development practices.

As the ecosystem continues to grow, with contributions from pharmaceutical companies, academic institutions and individual developers worldwide, the SAS Packages Framework is proving that the future of SAS programming lies not in isolated development, but in collaborative, community-driven innovation.

For SAS practitioners looking to modernise their development practices, improve code quality and tap into the collective knowledge of the global SAS community, exploring SAS packages isn't just an option, it's becoming an essential step towards more efficient and effective statistical programming.

Observations from selected sessions of SAS Innovate 2024

21^st June 2024

SAS Innovate 2024 provided insight into evolving approaches to analytics modernisation, platform development and applied data science across multiple industries. This document captures observations from sessions addressing strategic platform migrations, unified analytics environments, enterprise integration patterns and practical applications in regulated sectors. The content reflects a discipline transitioning from experimental implementations to production-grade, business-critical infrastructure.

Strategic Platform Modernisation

A presentation from DNB Bank detailed the organisation's migration from SAS 9.4 to SAS Viya on Microsoft Azure. The strategic approach proved counter-intuitive: whilst SAS Viya supports both SAS and Python code seamlessly, DNB deliberately chose to rewrite their legacy SAS code library into Python. The rationale combined two business objectives. First, expanding the addressable talent market by tapping into the global Python developer pool. Second, creating a viable exit strategy from their primary analytics vendor, ensuring compliance with financial regulatory requirements to demonstrate realistic vendor transition options within 30 to 90 days.

This decision represents a fundamental shift in enterprise software value propositions. Competitive advantage no longer derives from creating vendor lock-in, but from providing powerful, stable and governed environments that fully embrace open-source tools. The winning strategy involves convincing customers to remain because the platform delivers undeniable value, not because departure presents insurmountable difficulty. This is something that signals the maturing of a market, where value flows through partnership rather than proprietary constraints.

Unified Analytics Environments

A healthcare analytics presentation addressed the persistent debate between low-code/no-code interfaces for business users and professional coding environments for data scientists. Two analysts tackled identical problems (predicting diabetes risk factors using a public CDC dataset) using different approaches within the same platform.

The low-code user employed SAS Viya's Model Studio, a visual interface. This analyst assessed the model for statistical bias against variables such as age and gender by selecting a configuration option, whereupon the platform automatically generated fairness statistics and visualisations.

The professional coder used SAS Viya Workbench, a code-first environment similar to Visual Studio Code. This analyst manually wrote code to perform identical bias assessments. However, direct code access enabled fine-tuning of variable interactions (such as age and cholesterol), ultimately producing a logistic regression model with marginally superior performance compared to the low-code approach.

The demonstration illustrated that the debate presents a false dichotomy. The actual value resides in unified platforms, enabling both personas to achieve exceptional productivity. Citizen data scientists can rapidly build and validate baseline models, whilst expert coders can refine those same models with advanced techniques and deploy them, all within a single ecosystem. This unified approach characterises disciplinary development, where focus shifts from tribal tool debates to collective problem-solving.

Analytics as Enterprise Infrastructure

Multiple architectural demonstrations illustrated analytics platforms evolving beyond sophisticated workbenches for specialists into the central nervous system of enterprise operations. Three distinct patterns emerged:

The AI Assistant Architecture: A demonstration featured a customer-facing AI assistant built with Azure OpenAI. When users interacted with the chatbot regarding credit risk, requests routed through Azure Logic App not to the large language model for decisions but to a SAS Intelligent Decisioning engine. The SAS engine functioned as the trusted decision core, executing business rules and models to generate real-time risk assessments, which returned to the chatbot for customer delivery. SAS provided not the interface but the automated decision engine.

The Digital Twin Pattern: A pharmaceutical use case described using historical data from penicillin manufacturing batches to train machine learning models. These models became digital twins of physical bioreactors. Rather than conducting costly and time-consuming physical experiments, researchers executed thousands of in silico simulated experiments, adjusting parameters in the model to discover the optimal recipe for maximising yield (the "Golden Batch").

The Microsoft 365 Automation Hub: A workflow demonstration showed SAS programmes functioning as critical nodes in Microsoft 365 ecosystems. The automated process involved SAS code accessing SharePoint folders, retrieving Excel files, executing analyses, generating new reports as Excel files and delivering those reports directly into Microsoft Teams channels for business users.

These patterns mark profound evolution. Analytics platforms are moving beyond sophisticated calculators for experts, becoming foundational infrastructure: the connective tissue enabling intelligent automation and integrating disparate systems such as cloud office suites, AI interfaces and industrial hardware into cohesive business processes. This evolution from specialised tool to core infrastructure clearly indicates analytics' growing maturity within enterprise contexts.

Applied Data Science in High-Stakes Environments

Whilst much data science narrative focuses on e-commerce recommendations or marketing optimisation, compelling applications tackle intensely human, high-stakes operational challenges. Heather Hallett, a former ICU nurse and healthcare industry consultant at SAS, presented on improving hospital efficiency.

She described the challenge of staffing intensive care units, where having appropriate nurse numbers with correct skills proves critical. Staffing decisions constitute "life and death decisions". Her team uses forecasting models (such as ARIMA) to predict patient demand and optimisation algorithms (including mixed-integer programming) to create optimal nurse schedules. The optimisation addresses more than headcount; it matches nurses' specific skills, such as certifications for complex assistive devices like intra-aortic balloon pumps, to forecasted needs of the sickest patients.

A second use case applied identical operational rigour to community care. Using the classic "travelling salesman solver" from optimisation theory, the team planned efficient daily routes for mobile care vans serving maximum numbers of patients in their homes, delivering essential services to those unable to reach hospitals easily.

These applications ground abstract concepts of forecasting and optimisation in deeply tangible human contexts. They demonstrate that beyond driving revenue or reducing costs, the ultimate purpose of data science and operational analytics can be directly improving and even saving human lives. This application of sophisticated mathematics to life preservation marks data science evolution from commercial tool to critical component of human-centred operations.

Transparency as Competitive Advantage

In highly regulated industries such as pharmaceuticals, generating trustworthy research proves paramount. A presentation from Japanese pharmaceutical company Shionogi detailed how they transform the transparency challenge in Real-World Evidence (RWE) into competitive advantage.

The core problem with RWE studies, which analyse data from sources such as electronic health records and insurance claims, involves their historical lack of standardisation and transparency compared to randomised clinical trials, leading regulators and peers to question validity. Shionogi's solution is an internal system called "AI SAS for RWE", addressing the challenge through two approaches:

Standardisation: The system transforms disparate Real-World Data from various vendors into a Shionogi-defined common data model based on OMOP principles, ensuring consistency where direct conversion of Japanese RWD proves challenging.

Semi-Automation: It semi-automates the entire analysis workflow, from defining research concepts to generating final tables, figures and reports.

The most innovative aspect involves its foundation in radical transparency. The system automatically records every research process step: from the initial concept suite where analysis is defined, through specification documents, final analysis programmes and resulting reports, directly into Git. This creates a complete, immutable and auditable history of exactly how evidence was generated.

This represents more than a clever technical solution; it constitutes profound strategic positioning. By building transparent, reproducible and efficient systems for generating RWE, Shionogi directly addresses core industry challenges. They work to increase research quality and trustworthiness, effectively transforming regulatory burden into competitive edge built on integrity. This move toward provable, auditable results hallmarks a discipline transitioning from experimental art to industrial-grade science.

User Experience as Productivity Multiplier

In complex data tool contexts, user experience (UX) has evolved beyond "nice-to-have" aesthetic features into a central product strategy pillar, directly tied to user productivity and talent acquisition. A detailed examination of the upcoming complete rewrite of SAS Studio illustrated this point.

The motivation for the massive undertaking proved straightforward: the old architecture was slow and becoming a drag on user productivity. The primary goal for the new version involved making a web-based application "feel like a desktop application" regarding speed and responsiveness. To achieve this, the team focused on improvements directly boosting productivity for coders and analysts:

A Modern Editor: Integrating the Monaco editor used in the widely popular Visual Studio Code, providing familiar and powerful coding experiences.

Smarter Assistance: Improving code completion and syntax help to reduce errors and time spent consulting documentation.

Better Navigation: Adding features such as code "mini-maps" enabling programmers to navigate thousands of lines of code instantly.

For modern technical software, UX has become a fundamental competitive differentiator. Faster, more intuitive and less frustrating tools do not merely improve existing user satisfaction; they enhance productivity. In competitive markets for top data science and engineering talent, providing a best-in-class user experience represents a key strategy for attracting and retaining exceptional people. The next leap in team productivity might derive not from new algorithms but from superior interfaces.

Conclusion

These observations from SAS Innovate 2024 illustrate a discipline maturing. Data science is moving beyond isolated experiments and "science projects", becoming pragmatic, integrated, transparent and deeply human business functionality. Focus shifts from algorithmic novelty to real-world application value (whether enabling better user experiences, building regulatory trust or making life-or-death decisions on ICU floors).

As analytics becomes more integrated and accessible, the challenge involves identifying where it might unexpectedly transform core processes within organisations, moving from specialist concern to foundational infrastructure enabling intelligent, automated and human-centred operations.

Expanding the coding toolkit: Adding R and Python in a changing landscape

10^th April 2021

Over the years, I have taught myself a number of computing languages, with some coming in useful for professional work while others came in handy for website development and maintenance. The collection has grown to include HTML, CSS, XML, Perl, PHP and UNIX Shell Scripting. The ongoing pandemic allowed to me add two more to the repertoire: R and Python.

My interest in these arose from my work as an information professional concerned with standardisation and automation of statistical results delivery. To date, the main focus has been on clinical study data, but ongoing changes in the life sciences sector could mean that I may need to look further afield, so having extra knowledge never hurts. Though I have been a SAS programmer for more than twenty years, its predominance in the clinical research field is not what it was, which causes me to rethink things.

As it happens, I would like to continue working with SAS since it does so much and thoughts of leaving it after me bring sadness. It also helps to know what the alternatives might be and to reject some management hopes about any newcomers, especially regarding the amount of code being produced and the quality of graphs being created. Use cases need to be assessed dispassionately, even when emotions loom behind the scenes.

Since both R and Python bring large scripting ecosystems with active communities, the attraction of their adoption makes a deal of sense. SAS is comparable in the scale of its own ecosystem, though there are considerable differences and the platform is catching up when it comes to Data Science. While the aforementioned open-source languages may have had a head start, it appears that others are not standing still either. It is a time to have wider awareness, and online conference attendance helps with that.

The breadth of what is available for any programming language more than stymies any attempt to create a truly all encompassing starting point, and I have abandoned thoughts of doing anything like that for R. Similarly, I will not even try such a thing for Python. Consequently, this means that my sharing of anything learned will be in the form of discrete postings from time to time, especially given ho easy it is to collect numerous website links for sharing.

The learning has been facilitated by ongoing pandemic restrictions, though things are opening up a little now. The pandemic also has given us public data that can be used for practice, since much can be gained from having one's own project instead of completing exercises from a book. Having an interesting data set with which to work is a must, and COVID-19 data contain a certain self-interest as well, while one remains mindful of the suffering and loss of life that has been happening since the pandemic first took hold.

« Older Entries «