Coding Notebook

15:51, 9^th September 2021

SAS FILENAME Statement: EMAIL (SMTP) Access Method

The FILENAME statement's EMAIL access method in SAS allows users to send electronic mail programmatically via SMTP, and it supports a wide range of options for customising messages, including specifying recipients, carbon copy and blind carbon copy addresses, subject lines, message priority, sensitivity levels, expiration dates and file attachments. When SAS is operating in a locked-down state, the feature is unavailable unless re-enabled by a server administrator.

Users can incorporate conditional logic within a data step to control which recipients receive which messages and can direct procedure output, images and HTML content through email. PUT statement directives provide a way to override or modify message attributes at runtime, such as changing the recipient address, subject or attached files, and they also allow actions like sending a message mid-step or clearing existing message attributes. Secure communication with SMTP servers is supported through Transport Layer Security, though message-level encryption and digital signing are not currently available. Additional system options allow users to adjust the server response wait time and manage UTC offset settings for messages sent across time zones.

15:50, 9^th September 2021

How to send email using SAS

Sending emails via SAS presents several challenges, from configuration issues to security considerations. Users often encounter problems such as connection refusals, authentication failures, or emails not appearing in sent folders. These issues typically stem from incorrect SMTP settings, authentication methods, or limitations in how SAS interacts with email servers. For instance, using SMTP does not allow emails to be logged in sent folders, unlike MAPI, which is less reliable for automation.

Security is another concern, as the FROM address can be spoofed, though ISPs and email services may block such attempts. Specific examples include configuring SendGrid with SAS, where proper authentication (like using the correct API key format) is crucial. Solutions often involve verifying SMTP server details, ensuring correct port numbers and using appropriate authentication types. Additionally, restricting the FROM address to a fixed value or the user's account requires careful configuration, as SAS does not enforce this by default.

14:03, 26^th August 2021

Using SYSTASK and SAS macro loops for massively parallel processing

As data volumes continue to grow at a rapid pace, sequential processing increasingly falls short of meeting the demands of timely data analysis, making parallel processing a valuable alternative. A practical approach to parallelisation in SAS environments without SAS/CONNECT combines a shell script, a main SAS programme and individual thread programmes to handle a monthly data ingestion scenario.

The shell script launches the main SAS programme in the background, passing a year-month parameter to control which data are processed. The main programme is divided into three stages: pre-processing, which captures input parameters and calculates the number of days in the relevant month; parallel processing, which uses a SAS macro loop to generate a series of SYSTASK statements that spawn separate SAS sessions simultaneously, each responsible for ingesting a single day's CSV file; and post-processing, which consolidates the resulting daily data tables into a single monthly table.

The WAITFOR statement ensures that the main session pauses until all parallel threads have completed before the final consolidation step runs. Each thread programme writes its output directly to the WORK library of the main SAS session, making the data readily available for that final step. An alternative to the macro loop approach is to use CALL EXECUTE within a data step, which produces a comparable outcome by generating and executing SYSTASK statements sequentially whilst still allowing them to run in parallel at the operating system level.

21:31, 24^th August 2021

SASPy Examples

This GitHub repository provides sample Python notebooks demonstrating the functionality of SASPy, a tool for interacting with SAS software. It includes examples created by SAS, user-submitted contributions and notebooks addressing specific issues or their resolutions. The repository outlines guidelines for contributing, specifies an Apache Licence 2.0 for use, and links to external resources such as the Python website and SASPy documentation.

21:30, 24^th August 2021

SASPy Documentation

This Python module facilitates interaction with SAS systems by offering application programming interfaces that allow users to initiate SAS sessions, execute analytical procedures and transfer data between SAS datasets and Pandas dataframes, alongside exchanging values with SAS macro variables. It supports connections to SAS on the same or remote hosts, provides methods for data exploration such as describe and head, and integrates additional functionalities like machine learning and econometrics through dedicated Python classes. The module requires Python 3.4 or higher, SAS 9.4 or later and Java 7 or higher for specific connection methods, enabling compatibility across various SAS platforms.

10:58, 23^rd August 2021

Using the Hash Object to Store and Retrieve Data in SAS

The hash object in SAS is an in-memory mechanism for efficient data storage and retrieval, using lookup keys to locate specific values. To use it, a developer must declare and instantiate the object, then define keys and data variables using dot notation method calls, specifically the DEFINEKEY, DEFINEDATA and DEFINEDONE methods.

By default, each key must be unique, though the MULTIDATA argument tag allows multiple data values to be associated with a single key, which can then be traversed using methods such as FIND, FIND_NEXT, FIND_PREV, HAS_NEXT and HAS_PREV.

Data are stored using the ADD method and retrieved using the FIND method, while the REF method can combine both operations into a single call. The SUMINC argument tag enables a running numerical summary to be maintained for each key, updated automatically whenever certain methods are called.

Two attributes, NUM_ITEMS and ITEM_SIZE, allow developers to retrieve the number of stored items and the approximate memory being consumed by the object. The hash object can also be loaded directly from an existing data set using the dataset argument tag, and where duplicate keys are present, the DUPLICATE argument tag controls how they are handled.

18:18, 22^nd August 2021

Linear Regression in Python

Linear regression serves as a foundational technique in statistics and machine learning, offering a method to model relationships between variables. In Python, this approach can be implemented using libraries such as NumPy, scikit-learn and statsmodels.

Each tool brings distinct advantages: NumPy handles numerical computations and array manipulations, scikit-learn provides a streamlined interface for building and evaluating models and statsmodels offers detailed statistical insights. The process typically involves importing necessary modules, preparing and transforming data, fitting a model to the data, assessing its performance and using it to make predictions.

Whether the goal is to understand relationships between variables or to forecast outcomes, linear regression remains a versatile and widely applicable method. Its implementation in Python allows for both simplicity and depth, depending on the tools chosen and the level of analysis required.

12:45, 16^th August 2021

How to Overlay Plots in R - Quick Guide with Example

Overlaying plots in R can be achieved using the lines() and points() functions to combine multiple datasets within a single visualisation. A scatter plot can be created first with the plot() function, followed by the addition of line or scatter plots using lines() or points() with specified parameters such as colour, line type and point symbols.

For example, three line plots can be overlaid by initially plotting one dataset and then sequentially adding others with distinct colours and legends to differentiate them. Similarly, two scatter plots can be combined by plotting one dataset and then using points() to add another with different colours, ensuring clarity through the inclusion of a legend. These techniques allow for the simultaneous comparison of multiple data series, enhancing the ability to analyse relationships and patterns within the data.

10:47, 9^th August 2021

Installing Julia on Ubuntu

Installing Julia on Ubuntu can be achieved through multiple methods, with the most straightforward approach involving downloading the latest stable version from the official Julia website, extracting the files, relocating them to a system directory and creating a symbolic link to enable command-line access. An alternative method using the apt-get package manager is less recommended due to potential version discrepancies. Once installed, integrating Julia with Jupyter requires launching the Julia REPL and executing commands to install the IJulia package, which automatically configures the kernel for use within Jupyter Notebook or JupyterLab environments.

17:55, 7^th August 2021

Top 5 Tips for RStudio Workbench and Desktop

RStudio Workbench, formerly known as RStudio Server Pro, has introduced several features and updates aimed at enhancing productivity for users working with R. Among the key improvements are the ability to utilise keyboard shortcuts for common tasks such as running code or managing documents, a multi-column layout that allows for simultaneous viewing of multiple files, tools for efficiently locating and tracing functions and variables across projects, multi-cursor editing to streamline code modifications and the capability to quickly access function source code and documentation. These features are designed to support more efficient coding, data analysis and application development, offering users greater flexibility and control over their workflow while working with R and its associated tools.

« Older Entries «

» Newer Entries »