Technology Tales

Notes drawn from experiences in consumer and enterprise technology

09:01, 4th August 2021

NumFOCUS: A Nonprofit Supporting Open Code for Better Science

NumFOCUS supports open-source projects used by organisations ranging from major technology companies to research institutions, aiming to address complex challenges through collaborative development. It offers opportunities for community engagement, employment in open-source roles and ways for individuals and organisations to contribute financially or through sponsorship. The organisation also provides resources such as a newsletter, annual reports and a shop where proceeds support its initiatives, while maintaining a focus on fostering innovation and accessibility in scientific computing through its various programs and partnerships.

08:59, 27th May 2021

Apache ORC is a columnar storage format designed for Hadoop workloads, offering efficient data handling through features such as ACID transaction support, built-in indexes for rapid data retrieval and compatibility with complex data types including structs, lists and maps. It is maintained by the Apache Software Foundation, a non-profit organisation that oversees open-source projects under the Apache Licence, ensuring governance and privacy standards. The project provides documentation and tools for integration with various frameworks like Spark, Hive and Hadoop.

13:47, 16th May 2021

Machine Learning Operations

MLOps represents an emerging approach to managing the full lifecycle of machine learning development, aiming to integrate ML models into software engineering practices by unifying release cycles, enabling automated testing of models and data, applying agile methodologies and embedding ML components within continuous integration and delivery systems. It addresses challenges such as technical debt and emphasises cross-platform compatibility, while covering topics like workflow design, governance, deployment strategies and standardised processes for model development. The framework also includes tools for structuring infrastructure and defining governance practices, presented through various phases and principles to support the iterative and complex nature of ML-based software projects.

09:34, 13th May 2021

Business Process Model and Notation (BPMN) offers a standardised graphical method for organisations to visualise and communicate internal procedures, enhancing clarity in business operations and facilitating collaboration between entities. The BPMN specification, including versions such as 2.0 and supporting resources like quick guides and examples, provides a framework for consistent process representation. Certification programs, such as the OCEB 2 initiative, aim to validate expertise in enterprise BPMN through structured examinations, with credentials serving as a benchmark for professional competence in the field. Resources and frequently asked questions are available to support understanding and implementation of BPMN standards.

10:49, 12th May 2021

SAS User Group UK & Ireland

The Independent SAS Language Community is a London-based group with over 1,200 members, bringing together those with an interest in the SAS programming language across areas such as analytics, modelling and administration. The community holds regular meetups and has hosted more than 100 past events, covering topics ranging from mainframe modernisation and data analytics to the role of SAS in a world where languages such as R and Python are increasingly prevalent. The group is well regarded, holding a rating of 4.7 out of 5, and is organised by Andrew Ratcliffe among others. It actively seeks presenters, venues, volunteers, sponsors and ideas from those who wish to get involved and can also be found on LinkedIn.

10:47, 12th May 2021

Top YouTube Machine Learning Channels

KDnuggets recently identified the top 15 YouTube channels for machine learning based on a combination of views per video, subscriber count and video quantity, using a search criteria focused on relevance and activity over the past year. After excluding channels with fewer than 100,000 views or no updates in 12 months, one channel was omitted due to recent controversies, leaving a final list that highlights creators offering content ranging from foundational concepts to advanced applications in the field. The channels vary in focus, from educational tutorials to research insights, with descriptions provided where available to aid viewers in selecting content aligned with their learning goals.

10:47, 12th May 2021

Top YouTube Channels for Data Science

KDnuggets compiled a ranking of the top 15 YouTube channels covering data science content, determined by searching the platform using the term "data science", scraping the top 100 channel results, removing those without publicly available subscriber counts and re-sorting by subscriber numbers. The resulting list is led by Edureka!, which boasts over 2.4 million subscribers and more than 197 million views, followed by Joma Tech and Simplilearn. Other notable channels include StatQuest with Josh Starmer, which focuses on breaking down statistical and machine learning concepts into digestible steps, and Ken Jee, who combines data science with sports analytics. Channels such as Data School and 365 Data Science cater to those seeking structured learning paths, while Andreas Kretz focuses specifically on data engineering. The ranking also incorporates total view counts and views per subscriber to give a fuller picture of each channel's reach and engagement, with the list capped at 15 entries on the basis that channels beyond that threshold offered diminishing relevance to the data science field.

12:48, 15th April 2021

Apache Arrow is a columnar memory format designed for efficient data interchange and in-memory analytics, enabling fast access to structured and nested data across modern computing hardware. It supports zero-copy reads to eliminate serialisation overhead and is implemented through libraries available in multiple programming languages, facilitating high-performance analytics and integration with various tools. The project is community-driven, emphasising open collaboration and consensus-based decision-making, with contributions from diverse organisations and individuals.

15:12, 18th November 2020

Conda is an open-source tool for package, dependency and environment management that supports multiple programming languages. It can be installed via two main distributions: Miniconda, an Anaconda-preconfigured installer, and Miniforge, which is maintained by the Conda-forge community and preconfigured for the Conda-forge channel, with both available through Homebrew. Documentation covers the essentials for new users, including environment creation and management, alongside a full command reference, configuration guidance, cheat sheets and a glossary. Those wishing to contribute to the project can find guides covering project governance, contribution processes and development environment setup.

13:53, 23rd October 2020

rOpenSci Packages: Development, Maintenance, and Peer Review

The rOpenSci Dev Guide provides comprehensive guidance for individuals involved in the development, maintenance and peer review of R packages within the rOpenSci ecosystem. It outlines best practices for creating and testing packages, explains the peer review process including roles and responsibilities for authors, reviewers and editors and offers strategies for sustaining packages post-onboarding through collaboration, documentation, promotion and leveraging GitHub as a development platform. The guide also includes templates and resources to support various stages of package management and contribution, emphasising structured workflows and community engagement.

  • The content, images, and materials on this website are protected by copyright law and may not be reproduced, distributed, transmitted, displayed, or published in any form without the prior written permission of the copyright holder. All trademarks, logos, and brand names mentioned on this website are the property of their respective owners. Unauthorised use or duplication of these materials may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties.

  • All comments on this website are moderated and should contribute meaningfully to the discussion. We welcome diverse viewpoints expressed respectfully, but reserve the right to remove any comments containing hate speech, profanity, personal attacks, spam, promotional content or other inappropriate material without notice. Please note that comment moderation may take up to 24 hours, and that repeatedly violating these guidelines may result in being banned from future participation.

  • By submitting a comment, you grant us the right to publish and edit it as needed, whilst retaining your ownership of the content. Your email address will never be published or shared, though it is required for moderation purposes.