Technology Tales

Notes drawn from experiences in consumer and enterprise technology

22:02, 29th January 2022

OpenCPU is an HTTP-based API system for executing R functions and scripts remotely, using standard GET and POST methods to retrieve objects and perform remote procedure calls, respectively. The API is structured around a configurable root path and exposes endpoints for accessing installed R packages, their functions, datasets and documentation, as well as temporary sessions that store the outputs of function or script executions. R objects can be retrieved in a range of formats including JSON, CSV, PDF and PNG, and function arguments can be passed using several content types such as URL-encoded form data, multipart form data or JSON.

Scripts are executed by posting to their file path, with the interpreter determined by file extension, supporting formats including R, LaTeX, knitr and Markdown. A simplified JSON RPC mode is available for cases where only the output data are needed, returning results directly in a single request rather than requiring a follow-up retrieval step. The system also supports static web applications bundled within R packages and offers continuous integration with GitHub, whereby pushing a commit to a repository's master branch can trigger automatic package installation on an OpenCPU server.

08:54, 24th January 2022

The High-Paying Side Hustles for Data Scientists

The rise of remote working since the COVID-19 pandemic has led many data scientists to explore ways of supplementing their income through side work. Freelancing platforms such as Upwork, Toptal, AngelList and Kolabtree offer varying levels of entry, from open project bidding to elite networks requiring several years of experience. Technical writing is another viable avenue, whether through blogging on platforms like Medium, contributing articles to publications that offer financial rewards based on readership, or taking on ghostwriting work that, while uncredited, tends to pay at a premium rate. Contract work, covering areas such as machine learning model design, data analysis and research, offers clear terms and flexible hours, while consultancy, typically charged at an hourly rate, suits those with substantial field experience who can advise companies on data science strategy and investment. Career coaching rounds out the options, with platforms connecting experienced professionals with graduates and jobseekers needing guidance on interviews, networking and career direction. Beyond immediate earnings, these pursuits can broaden professional experience, strengthen a personal brand and contribute meaningfully to long-term career development.

14:00, 23rd December 2021

SciML is an open-source ecosystem designed for scientific machine learning, offering a modular framework that integrates differentiable programming with physics-informed AI to solve complex problems in differential equations, nonlinear systems and inverse problems. Built primarily in Julia, it leverages high performance and scalability through distributed and GPU parallelism, while supporting interoperability with Python and R via tools like diffeqpy and diffeqr.

The ecosystem includes advanced solvers for a wide range of equations, automated model discovery tools and methods for sparsity acceleration and compiler-assisted analysis, enabling efficient simulation and optimisation. It also provides ML-assisted tools for accelerating scientific computations, such as neural differential equations and surrogate models, alongside extensive community resources for collaboration and support. The platform fosters research and development through a large contributor base and a suite of tools for benchmarking and testing new methodologies, aiming to bridge the gap between theoretical advancements and practical applications in scientific computing.

16:46, 2nd December 2021

DataKind UK is a charity that supports third-sector organisations in the UK by enhancing their use of data analysis and science to address social challenges. Established in 2013, it connects these organisations with skilled volunteers who provide free, expert assistance to improve decision-making, build capacity and drive innovation. By fostering collaboration between data professionals and charities, voluntary groups and social enterprises, the organisation helps its partners navigate complex issues, leverage insights from data and strengthen their impact. Over the years, it has supported more than 280 organisations through hundreds of projects, contributing thousands of pro bono hours and demonstrating the value of data-driven approaches in addressing societal needs.

16:49, 21st October 2021

Open Neural Network Exchange (ONNX)

An open format designed to enable machine learning models to be used across various frameworks and hardware, ONNX provides a standardised set of operators and file formats that facilitate compatibility between different tools and runtimes. It supports a wide range of frameworks and accelerators, allowing developers to leverage hardware optimisations while maintaining flexibility in model development. As a community-driven project, ONNX encourages collaboration through contributions, working groups and events such as meetups and surveys aimed at gathering feedback to guide its ongoing development.

14:47, 28th September 2021

Data Sources in Power BI Desktop

Power BI Desktop supports a wide range of data sources, from traditional databases and spreadsheets to cloud-based services and web resources. Connecting to these sources involves selecting the appropriate protocol and specifying details such as server addresses, URLs, or file paths. For scenarios requiring shared connection settings, PBIDS files offer a way to export and distribute connection configurations. These files use a structured JSON format to define protocols, addresses and optional parameters like connection mode.

Examples include configurations for Azure Analysis Services, SharePoint lists, SQL Server and web data sources. When using PBIDS files, users must ensure compatibility with supported protocols and avoid including encrypted columns or unsupported features. The process of creating a PBIDS file can be automated through Power BI Desktop or manually edited in a text editor, allowing for flexibility in defining connection details. This approach facilitates collaboration and standardisation in data integration workflows.

14:03, 26th August 2021

Text Mining Node in SAS Model Studio on SAS Viya

The Text Mining node in SAS Model Studio on SAS Viya enables users to process unstructured data, such as free-form comments and reviews, and transform it into structured, quantitative representations through singular value decomposition, which can then be used as inputs for predictive modelling. When multiple variables carry a text role, the node defaults to the one with the greatest length, though users can override this by rejecting unwanted variables in the Data tab. Configurable parsing options include part-of-speech tagging, noun group extraction, entity extraction and term stemming, while a minimum document threshold controls which terms are retained.

From version Viya 4 2021.1.3 onwards, users can also upload custom lists such as stop lists and start lists. The node generates up to 25 topic-based features, and in demonstrated comparisons, a Decision Tree model that incorporated those features outperformed one that did not, illustrating the potential value of extracting information from unstructured data. SAS Model Studio also supports automated pipeline creation, which detects text-role variables automatically and incorporates the Text Mining node into a full pipeline covering data preparation, model building, hyperparameter tuning and model selection, with results showing that generated features frequently rank highly in variable importance across multiple model types.

14:02, 26th August 2021

Natural Language Processing: An Introduction

Natural Language Processing offers tools to extract meaningful insights from unstructured text data, enabling applications across various fields. Techniques such as tokenisation, sentiment analysis and entity recognition allow the transformation of textual information into structured formats that can be integrated with relational databases for predictive modelling and decision-making. In healthcare, for example, analysing patient notes can reveal psychosocial factors influencing treatment outcomes, while in legal contexts, automated summarisation of case documents aids in identifying key details. Beyond these, NLP supports innovations like chatbots and translation services, though its effectiveness relies heavily on the quality of input data. As the field advances, the integration of text analytics with other data sources will become increasingly vital for comprehensive insights, particularly in domains where traditional data alone may not capture the full complexity of a problem.

14:02, 4th August 2021

Data Science Experience | SAS

The Data Science Experience highlights real-world applications of data science across industries, showcasing how professionals address complex challenges through innovative solutions. Examples include using integrated tools to enhance customer experiences in banking and developing strategies to ensure model reliability in digital transformation initiatives. These stories illustrate the importance of combining technical expertise with clear objectives to solve problems in healthcare, insurance and public sector contexts. Additional resources such as training programs, events and cloud-based analytics deployment options are available to support further exploration and skill development in the field.

  • The content, images, and materials on this website are protected by copyright law and may not be reproduced, distributed, transmitted, displayed, or published in any form without the prior written permission of the copyright holder. All trademarks, logos, and brand names mentioned on this website are the property of their respective owners. Unauthorised use or duplication of these materials may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties.

  • All comments on this website are moderated and should contribute meaningfully to the discussion. We welcome diverse viewpoints expressed respectfully, but reserve the right to remove any comments containing hate speech, profanity, personal attacks, spam, promotional content or other inappropriate material without notice. Please note that comment moderation may take up to 24 hours, and that repeatedly violating these guidelines may result in being banned from future participation.

  • By submitting a comment, you grant us the right to publish and edit it as needed, whilst retaining your ownership of the content. Your email address will never be published or shared, though it is required for moderation purposes.