Technology Tales

Notes drawn from experiences in consumer and enterprise technology

13:49, 8th September 2022

Excel Formula Generator

Formula Bot is an AI-powered data analysis platform aimed primarily at marketers and data teams, enabling users to upload, connect and combine data from multiple sources and then query that data using plain language in any language. The platform can generate interactive charts and graphs, perform data transformation tasks such as cleaning, merging and reshaping datasets, and carry out text analysis functions including sentiment detection, keyword extraction and language translation.

Users can export results to Excel, create formatted reports and schedule recurring analyses to run automatically on a daily, weekly or monthly basis. Additional capabilities include a curated data explorer, embeddable analytics, web scraping, code transparency showing the underlying Python, SQL or R generated for each request, and a knowledge base for improving query accuracy. Security features include end-to-end encryption via AWS infrastructure, row-level access controls and isolated sandbox environments for each session, with a stated commitment to never using customer data for AI model training.

16:54, 17th August 2022

Six tips for better spreadsheets

Nature examines how widely used spreadsheet tools such as Microsoft Excel and Google Sheets are frequently misused, drawing on insights from data scientists and researchers. Stephanie Labou, a data-science librarian at the University of California, San Diego, highlights common pitfalls encountered in practice, including errors arising from manually entered data such as GPS coordinates. The piece offers six practical tips aimed at improving how spreadsheets are structured and used, with the broader goal of encouraging more reliable and reproducible data handling in scientific and research contexts.

17:50, 21st July 2022

Apache Superset

An open-source data exploration and visualisation platform, Apache Superset offers a range of tools for users to interact with data through intuitive interfaces, including a drag-and-drop chart builder, SQL query capabilities and pre-installed visualisations. It supports integration with numerous databases, from traditional systems to modern cloud-native solutions, and provides features such as data caching, customisable dashboards and semantic layers for complex data transformations. Designed to be lightweight and scalable, the platform enables teams to create interactive dashboards, explore datasets and perform detailed analysis using cross-filters and drill-down functionalities. Adopted by numerous organisations, it is used for self-serve analytics, allowing users to generate insights without requiring extensive technical expertise.

15:54, 8th July 2022

FOSS For Spectroscopy

A comprehensive catalogue of free and open-source software for spectroscopy has been compiled, covering techniques including NMR, IR, Raman, ESR/EPR, fluorescence, XRF, LIBS and UV-Vis, though mass spectrometry and MRI software are deliberately excluded. The catalogue currently lists 405 entries, the majority written in R or Python, and draws information directly from the respective project repositories and developer pages.

Each entry is lightly vetted to exclude incomplete or unclear projects, and a status date is provided for each to indicate the most recent repository activity, issue filing or package submission, serving as a rough indicator of whether a project is actively maintained. Related resources are also acknowledged, including a comprehensive survey of R packages suited to metabolomics, a curated collection focused on Raman spectroscopy and a task view covering chemometrics and computational physics.

15:36, 9th May 2022

9 Free Harvard Courses to Learn Data Science

8 Free MIT Courses to Learn Data Science Online

Both Harvard and MIT offer free online data science courses through their respective open learning platforms, covering a progression from beginner programming through to advanced machine learning. The Harvard pathway consists of nine courses, primarily taught in R, that move through programming basics, data visualisation, probability, statistics, data pre-processing, linear regression and machine learning, culminating in a capstone project that brings all the learnt skills together. The MIT pathway takes a similarly structured approach, beginning with an introductory Python programming course before moving into statistics, foundational mathematics including calculus and linear algebra, and finally an intermediate machine learning programme that requires prior knowledge of all the preceding subjects.

While the MIT courses are noted for their depth and relatively fast pace, the Harvard sequence is designed to cover the full data science workflow in a more applied manner, giving learners the opportunity to work with real-world data throughout. Both pathways are available to audit at no cost, though certificates of completion carry a fee, and all courses are hosted on the edX platform in the case of Harvard and MIT OpenCourseWare in the case of MIT.

14:32, 28th April 2022

The Book of OHDSI serves as a comprehensive resource detailing the Observational Health Data Sciences and Informatics collaborative, covering its community, data standards and analytical tools. Organised into five sections, it explores topics such as the common data model, standardised vocabularies, data analytics use cases, evidence quality assessment and methodologies for conducting studies within a distributed research network.

Aimed at both newcomers and experienced participants, the book provides theoretical explanations alongside practical guidance on implementing OHDSI initiatives, including cohort definition, population-level estimation and patient-level prediction. It is developed using open-source tools and maintained through continuous community contributions, with updates reflected in the online version to ensure alignment with evolving software and methodologies.

14:31, 28th April 2022

Observational Health Data Sciences and Informatics

The Observational Health Data Sciences and Informatics initiative brings together researchers, healthcare professionals and data scientists globally to enhance healthcare through large-scale analysis of observational health data. Based at Columbia University, the programme develops open-source tools and fosters collaboration across diverse stakeholders to generate real-world evidence that supports informed clinical decisions and improves patient outcomes. It hosts international events such as the 2026 Global Symposium, which aims to showcase innovations and strengthen partnerships in advancing healthcare research. The organisation also provides educational resources, software and a platform for sharing findings, with ongoing efforts to expand its network and impact through community engagement and scientific exchange.

14:54, 27th April 2022

CDISC Open-Source Alliance

The CDISC Open-Source Alliance maintains a directory of repositories that have been officially recognised as open-source projects aimed at implementing or developing CDISC standards, with the goal of fostering innovation within the CDISC community. Each project must satisfy specific inclusion criteria before being listed in the directory, and smaller projects that emerge from hackathons are catalogued separately in a dedicated hackathons panel.

14:54, 27th April 2022

OpenClinica is a modular eClinical platform designed primarily for small to midsize organisations involved in clinical research, including academic institutions, sponsors, contract research organisations and biotech companies. It brings together electronic data capture, electronic consent, patient-reported outcomes, randomisation, EHR integration, analytics and patient recruitment into a single offering, with the stated aim of reducing the time required to launch a study from several months to just a few weeks.

Each client is assigned a dedicated Customer Success Manager rather than being directed to a general support queue, and the platform also provides around-the-clock application support for most of the working week, alongside an on-demand training system. Organisations can either configure studies themselves using drag-and-drop tools and pre-built templates, or engage the platform's professional services team to handle the build on their behalf.

The platform claims to reduce data queries by around half and to deliver patient recruitment at significantly lower cost per conversion than traditional approaches. Having reportedly supported more than 15,000 studies and three million patients globally, it positions itself as a practical middle ground between overly complex enterprise systems and tools that lack the necessary rigour for regulated clinical research.

15:49, 21st March 2022

Download, Tidy and Visualize Covid-19 Related Data

The Mathematics and Statistics of Infectious Disease Outbreaks

The tidycovid19 R package, created by economist Joachim Gassen, aggregates and tidies COVID-19 related data from multiple authoritative sources to support research into the pandemic, with a particular focus on non-pharmaceutical interventions. Data are drawn from organisations including Johns Hopkins University, the European Centre for Disease Prevention and Control, Our World in Data, the World Bank, ACAPS, Oxford University and Google and Apple mobility reports, all accessible through dedicated download functions. The package also includes visualisation tools for plotting the spread of the virus, generating stripe-based country comparisons and mapping global or regional trends, as well as a Shiny app for interactive exploration. A separate but related GitHub repository hosts materials for the MT3002 summer 2020 course on the mathematics and statistics of infectious disease outbreaks, delivered at Stockholm University by Tom Britton and Michael Hohle, covering topics such as epidemic modelling, reproduction numbers, vaccination, outbreak detection and COVID-19-specific analyses through video lectures, slides and accompanying R code.

  • The content, images, and materials on this website are protected by copyright law and may not be reproduced, distributed, transmitted, displayed, or published in any form without the prior written permission of the copyright holder. All trademarks, logos, and brand names mentioned on this website are the property of their respective owners. Unauthorised use or duplication of these materials may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties.

  • All comments on this website are moderated and should contribute meaningfully to the discussion. We welcome diverse viewpoints expressed respectfully, but reserve the right to remove any comments containing hate speech, profanity, personal attacks, spam, promotional content or other inappropriate material without notice. Please note that comment moderation may take up to 24 hours, and that repeatedly violating these guidelines may result in being banned from future participation.

  • By submitting a comment, you grant us the right to publish and edit it as needed, whilst retaining your ownership of the content. Your email address will never be published or shared, though it is required for moderation purposes.