Technology Tales

Notes drawn from experiences in consumer and enterprise technology

18:22, 13th January 2022

Handling Categorical Data in R - Part 2

Part 2 of the Rsquared Academy series on handling categorical data in R focuses on summarising categorical data using a variety of functions and packages. Unlike numeric data, categorical data are best summarised through counts, proportions, cumulative frequencies and cross tables, and R provides several tools to achieve this.

Key functions covered include nlevels() and levels() for identifying the number and names of categories, table() and fct_count() for frequency tabulation, and prop.table() or proportions() for calculating proportions and percentages. There also are two-way and multidimensional tables using table() and xtabs(), alongside supporting functions such as margin.table(), addmargins() and ftable() for computing marginal totals and formatting complex tables.

Handling of missing values within tables is addressed through the useNA argument, and subsetting tables using bracket notation is demonstrated. For more detailed cross-tabulation output similar to that produced by SAS or SPSS, the CrossTable() function from the gmodels package and ds_cross_table() from the descriptr package are presented as useful alternatives.

15:45, 12th January 2022

How To Connect R Shiny to Postgres Database

Connecting R and R Shiny to a PostgreSQL database is a straightforward process that requires only a few packages and the relevant connection parameters, such as the database name, host, port and user credentials. To demonstrate this, a publicly available earthquake dataset from Kaggle is first imported into a PostgreSQL table using a GUI management tool, after which R establishes a connection to the database and retrieves the data using standard query functions.

Building on this foundation, an interactive R Shiny dashboard can be constructed that renders a map of earthquake locations, filtered dynamically by magnitude using a slider input, with the database queried each time the input value changes and the connection closed once the data has been retrieved. While this approach prioritises demonstrating proper connection handling over performance, it is noted that loading the full dataset outside the server function and filtering it locally could improve responsiveness.

13:16, 11th January 2022

Python Excel is a resource dedicated to helping users automate and manage Excel spreadsheet tasks using Python, covering operations such as reading, writing, modifying and organising data across Excel files. Several Python libraries are available for this purpose, the most widely used being openpyxl, which was developed by Eric Gazoni and Charlie Clark and supports reading and writing XLSX and related file formats without requiring Excel to be installed.

Other notable libraries include xlrd, which is suited to reading both XLS and XLSX files across Windows, Linux and Mac platforms, xlsxwriter, which specialises in data formatting and charting, xlwt, which is designed for older Excel 95 to 2003 file formats and xlutils, which bundles xlrd and xlwt together into a single package.

The practical value of combining Python with Excel is significant, particularly for professionals in data analysis, data mining and machine learning, where manually processing large volumes of spreadsheet data would be time-consuming and prone to error. By writing relatively simple code, tasks that might otherwise take days or weeks can be completed in seconds, making Python a powerful tool for automating repetitive office work and improving overall efficiency.

12:15, 10th January 2022

Learn Shiny

Build interactive web applications using the R programming language by leveraging the Shiny package, which simplifies the creation of dynamic user interfaces and server logic. A typical Shiny app consists of a user interface object defining layout and appearance, a server function containing computational instructions, and a call to the shinyApp function to combine them.

Installation requires running install.packages("shiny") and example apps like Hello Shiny demonstrate core concepts, such as using sliders to adjust histogram bins in real time. Apps are structured in an app.R file within a directory, launched with runApp and can be modified by editing parameters like titles, slider ranges, or visual elements. Users can relaunch apps through RStudio or command-line tools and further development involves exploring built-in examples, deploying apps online, or customising layouts and reactivity.

12:08, 10th January 2022

Top 7 Best R Shiny Books and Courses That Are Completely Free

Highlighting free resources for learning R Shiny, here are seven books and courses designed to support users at varying skill levels, from foundational concepts to advanced deployment techniques. The materials cover topics such as reactive programming, user interface design, production-grade application development, integration with JavaScript and AWS and practical deployment strategies using tools like Docker and Git. Courses are tailored for visual learners, while books provide in-depth guidance on building interactive applications, optimising performance and leveraging modern web technologies. All resources are accessible without cost, offering structured learning paths for individuals seeking to develop proficiency in R Shiny for data visualisation and web application development.

12:06, 10th January 2022

How renv restores packages from r-universe for reproducibility or production

Restoring packages from r-universe involves leveraging metadata embedded in package DESCRIPTION files to trace their origin to upstream git repositories. renv identifies the exact commit hash and remote repository URL from these fields, enabling precise reinstalls either via the cranlike repository if available or directly from git when necessary, ensuring reproducibility by relying on immutable commit identifiers rather than version numbers. This approach aligns with the r-universe model of continuous deployment from git, bypassing the need for archival storage by using upstream repositories as the definitive source of historical code.

12:04, 10th January 2022

Handling Categorical Data in R - Part 1

Categorical data, which is also known as qualitative data, is a fundamental component of data science projects and differs from numerical or other data types in how it is read, stored, summarised, reshaped and visualised. It is always discrete, consists of names or labels and takes on a limited, fixed number of possible values, with analysis typically involving data tables. These data are further divided into nominal data, which has no intrinsic order such as blood group or gender, and ordinal data, which can be ranked such as satisfaction ratings or education level, though the magnitude of differences between categories cannot be determined.

In R, such data are stored using a structure called a factor, and several functions are available for working with this data type, including is.factor() and is.ordered() for membership testing, as.factor() and as_factor() for converting other data types to factors, and the factor() and ordered() functions for finer control over specifying levels, modifying labels, handling missing values and creating ordered factors. The key distinction between as.factor() and as_factor() is that the former orders levels alphabetically, whilst the latter orders them by their first appearance in the data.

13:18, 8th January 2022

Exception handling in Julia is the process of managing unexpected conditions that arise during programme execution, preventing abrupt termination by providing an alternative path for the programme to follow. Julia achieves this primarily through try-catch blocks, where potentially problematic code is placed in the try section and any resulting exception is caught and addressed in the catch section.

Exceptions propagate up through the call stack until a suitable try-catch block is found, and they can also be stored in a variable within the catch block to allow for more precise handling of multiple exception types, a method known as the canonical approach. A finally clause can be added to ensure that certain code, such as closing an open file, runs regardless of whether an exception occurred.

Custom exceptions can be thrown deliberately using the throw() function, and the error() function generates a general ErrorException, while exceptions can also be re-thrown from within a catch block to be handled further up the call stack. Julia also provides a range of built-in exception types covering common error conditions, including ArgumentError, BoundsError, DomainError, DivideError, MethodError, OverflowError and UndefVarError, among others.

09:49, 8th January 2022

New features in DataFrames.jl 1.3: Part 1

New features in DataFrames.jl 1.3: Part 2

New features in DataFrames.jl 1.3: Part 3

New features in DataFrames.jl 1.3: Part 4

New features in DataFrames.jl 1.3: Conclusion

Version 1.3 of DataFrames.jl, a Julia data manipulation package, introduced several notable improvements across performance, usability and functionality. Row aggregation operations were made significantly faster through optimised fast-path implementations for common functions such as sum, mean, median, minimum, maximum and others when used within the package's data transformation mini-language.

The release also extended broadcasting support to allow column selectors such as Not, Between, Cols and All to be resolved within the context of a data frame during transformation operations, reducing the need for verbose syntax. Indexing improvements enabled columns to be added to data frame views for the first time, provided the view was created using a full column selector, which in turn allowed standard functions like transform! to operate directly on filtered subsets of data.

A new leftjoin! function was introduced to perform in-place left joins, offering memory efficiency for large datasets and demonstrating performance comparable to the R data.table package in benchmarks. Additional changes included greater control over group ordering in the groupby function through a sort keyword argument, a fill keyword argument for the unstack function to replace missing values with a specified default, the deprecation of the delete! function in favour of deleteat! for consistency with Julia Base conventions and a fix to incorrect sorting behaviour when an empty column selector was passed to the sort function.

18:12, 2nd January 2022

ROBOCOPY is a robust command-line file copying utility built into Windows that supports a wide range of options for copying, moving and mirroring files and directories between locations. It is compatible with Windows 10, Windows 11 and multiple versions of Windows Server, as well as Azure Local. Users can specify a source, a destination and optional file filters, then apply numerous parameters to control behaviour, including copying subdirectories, preserving file attributes and timestamps, enabling restartable or backup mode, and setting retry counts and wait intervals.

Multithreaded copying is supported to improve performance, and I/O throttling options allow bandwidth usage to be managed during transfers. File selection can be refined by including or excluding files based on attributes, age, size or last access date, and logging options allow detailed output to be written to a file for review. The tool also returns exit codes indicating the outcome of each operation, ranging from no files copied through to partial or complete failures, with any code of eight or above signifying at least one failure occurred during the process.

  • The content, images, and materials on this website are protected by copyright law and may not be reproduced, distributed, transmitted, displayed, or published in any form without the prior written permission of the copyright holder. All trademarks, logos, and brand names mentioned on this website are the property of their respective owners. Unauthorised use or duplication of these materials may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties.

  • All comments on this website are moderated and should contribute meaningfully to the discussion. We welcome diverse viewpoints expressed respectfully, but reserve the right to remove any comments containing hate speech, profanity, personal attacks, spam, promotional content or other inappropriate material without notice. Please note that comment moderation may take up to 24 hours, and that repeatedly violating these guidelines may result in being banned from future participation.

  • By submitting a comment, you grant us the right to publish and edit it as needed, whilst retaining your ownership of the content. Your email address will never be published or shared, though it is required for moderation purposes.