13:47, 16th May 2021
Machine Learning Operations
MLOps represents an emerging approach to managing the full lifecycle of machine learning development, aiming to integrate ML models into software engineering practices by unifying release cycles, enabling automated testing of models and data, applying agile methodologies and embedding ML components within continuous integration and delivery systems. It addresses challenges such as technical debt and emphasises cross-platform compatibility, while covering topics like workflow design, governance, deployment strategies and standardised processes for model development. The framework also includes tools for structuring infrastructure and defining governance practices, presented through various phases and principles to support the iterative and complex nature of ML-based software projects.
09:34, 13th May 2021
Business Process Model and Notation (BPMN) offers a standardised graphical method for organisations to visualise and communicate internal procedures, enhancing clarity in business operations and facilitating collaboration between entities. The BPMN specification, including versions such as 2.0 and supporting resources like quick guides and examples, provides a framework for consistent process representation. Certification programs, such as the OCEB 2 initiative, aim to validate expertise in enterprise BPMN through structured examinations, with credentials serving as a benchmark for professional competence in the field. Resources and frequently asked questions are available to support understanding and implementation of BPMN standards.
10:49, 12th May 2021
SAS User Group UK & Ireland
The Independent SAS Language Community is a London-based group with over 1,200 members, bringing together those with an interest in the SAS programming language across areas such as analytics, modelling and administration. The community holds regular meetups and has hosted more than 100 past events, covering topics ranging from mainframe modernisation and data analytics to the role of SAS in a world where languages such as R and Python are increasingly prevalent. The group is well regarded, holding a rating of 4.7 out of 5, and is organised by Andrew Ratcliffe among others. It actively seeks presenters, venues, volunteers, sponsors and ideas from those who wish to get involved and can also be found on LinkedIn.
10:47, 12th May 2021
Top YouTube Channels for Data Science
KDnuggets compiled a ranking of the top 15 YouTube channels covering data science content, determined by searching the platform using the term "data science", scraping the top 100 channel results, removing those without publicly available subscriber counts and re-sorting by subscriber numbers. The resulting list is led by Edureka!, which boasts over 2.4 million subscribers and more than 197 million views, followed by Joma Tech and Simplilearn. Other notable channels include StatQuest with Josh Starmer, which focuses on breaking down statistical and machine learning concepts into digestible steps, and Ken Jee, who combines data science with sports analytics. Channels such as Data School and 365 Data Science cater to those seeking structured learning paths, while Andreas Kretz focuses specifically on data engineering. The ranking also incorporates total view counts and views per subscriber to give a fuller picture of each channel's reach and engagement, with the list capped at 15 entries on the basis that channels beyond that threshold offered diminishing relevance to the data science field.
10:47, 12th May 2021
Top YouTube Machine Learning Channels
KDnuggets recently identified the top 15 YouTube channels for machine learning based on a combination of views per video, subscriber count and video quantity, using a search criteria focused on relevance and activity over the past year. After excluding channels with fewer than 100,000 views or no updates in 12 months, one channel was omitted due to recent controversies, leaving a final list that highlights creators offering content ranging from foundational concepts to advanced applications in the field. The channels vary in focus, from educational tutorials to research insights, with descriptions provided where available to aid viewers in selecting content aligned with their learning goals.
12:48, 15th April 2021
Apache Arrow is a columnar memory format designed for efficient data interchange and in-memory analytics, enabling fast access to structured and nested data across modern computing hardware. It supports zero-copy reads to eliminate serialisation overhead and is implemented through libraries available in multiple programming languages, facilitating high-performance analytics and integration with various tools. The project is community-driven, emphasising open collaboration and consensus-based decision-making, with contributions from diverse organisations and individuals.
15:12, 18th November 2020
Conda is an open-source tool for package, dependency and environment management that supports multiple programming languages. It can be installed via two main distributions: Miniconda, an Anaconda-preconfigured installer, and Miniforge, which is maintained by the Conda-forge community and preconfigured for the Conda-forge channel, with both available through Homebrew. Documentation covers the essentials for new users, including environment creation and management, alongside a full command reference, configuration guidance, cheat sheets and a glossary. Those wishing to contribute to the project can find guides covering project governance, contribution processes and development environment setup.
13:53, 23rd October 2020
rOpenSci Packages: Development, Maintenance, and Peer Review
The rOpenSci Dev Guide provides comprehensive guidance for individuals involved in the development, maintenance and peer review of R packages within the rOpenSci ecosystem. It outlines best practices for creating and testing packages, explains the peer review process including roles and responsibilities for authors, reviewers and editors and offers strategies for sustaining packages post-onboarding through collaboration, documentation, promotion and leveraging GitHub as a development platform. The guide also includes templates and resources to support various stages of package management and contribution, emphasising structured workflows and community engagement.
09:21, 7th October 2020
rOpenSci promotes open and reproducible research by developing shared software tools and fostering collaboration among researchers and engineers through the R programming language. It employs a peer review process to validate and improve scientific software, supporting a community of developers and maintainers who contribute to a wide range of packages across disciplines such as statistics, data visualisation and geospatial analysis.
Initiatives like the Champions Program encourage leadership in open science, while projects such as R-Universe provide platforms for discovering and publishing R tools. Efforts to expand accessibility include translating documentation into multiple languages and creating resources for open-source contributions. The organisation collaborates with institutions and supports workflows that enhance data analysis, security and reproducibility in scientific research.
10:47, 18th September 2020
Open-Source Portal for Clinical Study Evaluations
This open-source portal has been developed to serve as a centralised collection of links, programmes and scripts related to clinical study evaluations, addressing the difficulty researchers face when trying to identify what open-source solutions exist in this space. The portal includes metadata storage and user-friendly search and navigation tools, with content gathered manually and supplemented through header analysis and available metadata. It has grown over time to include tools such as DaVinci, carver, oak and teal, as well as other R packages, conference videos, podcasts and an education section. Plans include enabling user ratings and edits, expanding available content, and developing a process for managing and downloading open-source SAS macros in a manner comparable to how R packages are installed.