11:21, 28th May 2021
SAS Workshops and Notes
The Social Science Computing Cooperative at UW-Madison provides a comprehensive set of SAS learning materials authored by Doug Hemken, covering everything from fundamental concepts to advanced programming techniques. The materials guide users through working with SAS interfaces on both Windows and Linux systems, managing SAS files and submitting commands, and understanding the core SAS language and its grammar.
Statistical procedures such as frequencies, crosstabs, means, correlations and regression are covered alongside three distinct graphics systems and their associated commands. More advanced topics include building and reading data sets, subsetting and merging data, working with arrays and macros and producing output data sets. The materials also address how to document SAS work using Markdown and RMarkdown for producing web pages, PDF handouts and Word documents that incorporate SAS code and output, and include guidance on running R from within SAS.
11:18, 28th May 2021
SASweave: Literate programming using SAS
SASweave is a tool designed to integrate SAS code, output and graphics into documents, enabling the creation of reports that combine executable code with its results. It processes a LaTeX-based source file containing SAS code, executes the code and generates a LaTeX file that includes the code, output and any generated graphics, which can then be compiled into a formatted document.
The tool supports processing of both SAS and R code within the same source file, with actions determined by file extensions. It also provides a tangling feature to extract SAS code for separate use. By ensuring that output directly reflects the executed code, SASweave facilitates literate programming, a method that intertwines documentation, code and results to enhance transparency and reproducibility in analytical workflows.
11:17, 28th May 2021
How to track the performance of your blog in R?
Antoine Soetewey demonstrates how to use the googleAnalyticsR package in R to analyse blog performance data drawn from Google Analytics. Using a year's worth of data spanning December 2019 to December 2020, the blog had attracted nearly 322,000 users, generating over 428,000 sessions and 560,000 page views, with a notable traffic spike in late April 2020 caused by a viral post about downloading free Springer books during the COVID-19 lockdown.
He then works through a series of visualisations built with ggplot2, covering daily session trends, traffic by channel, sessions broken down by day of the week and hour of the day, monthly comparisons and top-performing pages. A particularly useful section addresses time-normalised page views, which adjusts for the fact that older posts have had more time to accumulate traffic, allowing for fairer comparisons between articles published at different times. Organic search accounts for the majority of traffic, that desktop usage peaks later in the day compared to mobile and that weekday traffic is consistently higher than weekend traffic, suggesting readers engage with the content primarily for professional or educational purposes.
11:16, 28th May 2021
RPubs serves as a straightforward online platform for publishing documents created using R Markdown, enabling users to share analyses, reports, tutorials and reproducible research with minimal setup. The process involves writing an R Markdown file, converting it to HTML and publishing it directly through RStudio, which generates a public URL without requiring users to manage hosting or deployment. Published content typically includes statistical analyses, visualisations, code and narrative explanations, all consolidated into a single document. While RPubs offers free hosting, one-click publishing and permanent URLs, its minimalist design limits features such as custom domains, advanced styling and multipage site creation, often prompting users to transition to more comprehensive tools like Quarto or Posit Connect for complex publishing needs.
11:15, 28th May 2021
The R Graph Gallery is a comprehensive online collection of over 400 charts created using the R programming language, organised into nearly 50 chart categories that span distribution, correlation, ranking, mapping, flow and evolution chart types. Each example includes reproducible code and detailed explanations, with foundational tutorials covering core structures before progressing to step-by-step customisation guides.
The gallery places particular emphasis on the tidyverse and ggplot2 packages, and is connected to the Data to Viz project, which provides a decision tree to help users select the most appropriate chart type for their data. Beyond beginner-level content, the gallery also curates a selection of exceptional R-based visualisations sourced from across the internet and community submissions, offering templates and code snippets for those looking to advance their data visualisation skills.
11:13, 28th May 2021
Laying out multiple plots on a page in R
Creating multiple plots in R and arranging them on a page involves using packages such as ggplot2, gtable and grid, which define the structure of individual plots through graphical objects. Functions like grid.arrange from gridExtra allow simple layouts by specifying rows, columns, or custom matrices, while more complex arrangements, such as aligning plot panels or embedding one plot within another, may require converting plots to grobs and using tools like gtable or egg for precise control.
Techniques include using annotation_custom for insets, adjusting dimensions to ensure alignment and combining plots with tables or other graphical elements. Alternatives like cowplot and patchwork offer additional flexibility, while the grid package provides foundational low-level functions for layout management. These methods enable the organisation of multiple plots, whether for alignment, shared legends, or multipage outputs, ensuring consistent visual presentation and adaptability to different display requirements.
11:12, 28th May 2021
Vignette: Write & Read Multiple Excel files with purrr
Martin Chan demonstrates how to use the functional programming package purrr, alongside readxl and writexl, to write and read multiple Excel and CSV files in R. The approach involves splitting a dataset into a list of data frames, in this case using the iris dataset divided by species, and then iterating over those data frames to export them as either a multi-sheet Excel file or individual CSV files.
For reading files back into R, the post outlines two broad options: loading each dataset separately into the global environment using assign(), or reading everything into a single list, which keeps the workspace tidier and makes it easier to apply operations across all data frames simultaneously. Both approaches are demonstrated for Excel and CSV formats, with the purrr method presented as cleaner and more consistent with tidyverse conventions than traditional alternatives such as lapply() or for loops.
11:11, 28th May 2021
A Compendium of Clean Graphs in R
Written by Eric-Jan Wagenmakers and Quentin F. Gronau, this compendium provides a practical guide to producing clean, publication-ready graphs using R, built around the philosophy that a good graph should be as simple as possible without sacrificing necessary information. It covers a broad range of graph types, including correlations, histograms, line plots, bar plots, density plots, time series and multi-panel figures, with each example accompanied by reproducible R code that readers can copy and adapt for their own purposes. The authors emphasise several core principles throughout, namely investing sufficient time and effort, removing unnecessary graphical elements, ensuring visual balance, using large font sizes and moving beyond R's default settings.
The majority of examples are produced in base R, though some use ggplot2, and many of the included graphs were contributed by colleagues and collaborators. A dedicated section covers graphs produced by JASP, a free statistical analysis programme developed at the University of Amsterdam, while a miscellaneous section features more specialist visualisations such as funnel plots, network graphs, forest plots and heatmaps. The compendium is intended as a hands-on reference rather than a conceptual guide to choosing graph types, with the authors estimating that applying its lessons will be roughly 80 per cent copying existing code and 20 per cent adapting it to suit individual needs.
11:09, 28th May 2021
Creating PDFs with fpdf2 and Python
The fpdf2 library offers a straightforward approach to generating PDF documents using Python. It supports fundamental tasks such as adding text, images and basic formatting, making it suitable for simple reports or forms.
While it lacks advanced features like complex layouts or interactive elements, it provides limited HTML support through the HTMLMixin class, enabling the conversion of basic HTML structures into PDFs. This capability allows for the inclusion of tables, lists and styled text, though the results may not match the precision of more specialised tools. The library also integrates with frameworks like Web2Py, facilitating the creation of reports within web applications.
However, its limitations become apparent when dealing with intricate designs or dynamic content, where alternatives such as ReportLab might be more appropriate. Despite these constraints, fpdf2 remains a viable option for users requiring minimalistic PDF generation without the complexity of more feature-rich libraries.
11:08, 28th May 2021
Convert dataframe into PDF report in Python
Converting a Pandas dataframe into a PDF report in Python can be achieved using the pdfkit library, which renders HTML into PDF format and supports various image formats and complex printable documents. The process requires installing pdfkit via pip and also downloading wkhtmltopdf, which handles images and other complex rendering tasks.
Once the necessary tools are in place, data is imported into a dataframe, in this case from an Excel file containing salary information, and then converted into HTML using Pandas' built-in to_html method, which writes the output to an HTML file. That HTML file is then passed to pdfkit, which converts it into a finished PDF document using the wkhtmltopdf software installed on the local machine.