18:35, 26th July 2022
How to update/upgrade Debian/Ubuntu Linux using Ansible
Keeping Debian and Ubuntu Linux servers up to date is essential for security and stability, particularly when managing multiple machines. Ansible simplifies this process by using its apt module to refresh the package cache and upgrade all installed packages across servers simultaneously, employing apt-get rather than aptitude to handle the updates. Where kernel upgrades are applied, a reboot is often required, and Ansible can check for the presence of the /var/run/reboot-required file to determine whether this is necessary, automatically rebooting any affected server and waiting for it to come back online before continuing. A hosts file is used to define the target servers, and the full logic is written into a playbook that can be executed with a single command, making the entire update and reboot process repeatable and consistent across a Debian-based infrastructure.
18:35, 26th July 2022
ansible.builtin.apt module – Manages apt-packages
The apt module serves as a tool for managing packages on Debian-based systems, enabling actions such as installation, removal and updates. It allows users to specify the desired state of a package, whether present, absent or latest, and includes options for handling dependencies and cache updates. Parameters like update_cache ensure repositories are refreshed before operations, while features such as default_release and allow_downgrade provide control over version selection and conflict resolution.
The module supports installing specific versions of packages, managing build dependencies and performing system upgrades through options like dist, safe or full. Its functionality extends to cleaning caches, removing unused dependencies and purging configuration files when necessary. By integrating these capabilities, the module streamlines package management tasks, offering flexibility for both routine maintenance and targeted installations.
18:33, 21st July 2022
Using cURL in Python with PycURL
PycURL is a Python interface to the libcURL library that enables data transfer to and from servers, supporting multiple protocols including HTTPS, FTPS, SMTP, IMAP and SMB, among others. It is particularly well suited for testing REST APIs, downloading files and handling large numbers of concurrent connections, and is notably faster than the popular Python Requests library.
Installation is straightforward across operating systems, with Mac and Linux requiring no additional dependencies, while Windows users may need to address a few prerequisites beforehand. Once installed, PycURL can be used to perform HTTP GET requests to retrieve data from a given URL, examine response headers, send form data via POST, upload files using either multipart POST or PUT requests, send DELETE requests to remove server-side resources and write responses directly to a local file, all through a relatively consistent coding pattern built around the setopt function and the Curl object.
15:54, 8th July 2022
FOSS4Spectroscopy: R vs Python
Bryan Hanson's FOSS for Spectroscopy project, which catalogues free and open-source software for spectroscopic applications, received a major update in July 2022, at which point Python packages outnumbered R packages by more than two to one. Packages are discovered by searching four repositories, namely CRAN, GitHub, PyPi.org and juliapackages.org, across a range of spectroscopic topics including NMR, Raman, FT-IR, XRF and several others. To support automated searching of GitHub and PyPi.org, Hanson developed a package called webu, which handles API queries and incorporates deliberate delays to avoid overloading servers. Raw search results require considerable manual inspection and cleaning before they are suitable for inclusion in the main database, with additional scripts used to remove duplicate entries and resolve inconsistencies in package naming conventions.
09:02, 1st July 2022
The htmlwidgets package enables the integration of JavaScript-based data visualisations into R, allowing users to generate interactive plots directly at the R console, embed them within R Markdown documents or Shiny applications and develop custom widgets that bridge R and JavaScript seamlessly. Tools such as flexdashboard facilitate the arrangement of multiple widgets into flexible layouts, while crosstalk supports linking visualisations for coordinated interactions. The framework also provides guidance for creating new widgets, expanding the range of available visualisations and enhancing their use in analytical workflows.
14:05, 29th June 2022
Semantic Versioning offers a structured approach to version numbering, enabling developers to communicate changes clearly and manage dependencies effectively. By adhering to a formal specification, version numbers become meaningful indicators of compatibility and stability.
The system divides updates into major, minor and patch levels, with major versions reserved for backward-incompatible changes, minor for new features and patch for bug fixes. This clarity allows dependent projects to specify ranges that ensure compatibility without requiring constant updates.
The specification emphasises the importance of a well-defined public API, particularly when releasing version 1.0.0, which signals a stable foundation for users. During initial development (0.y.z), rapid iteration is encouraged, but once the API solidifies, careful consideration of backward compatibility becomes essential. Deprecating features requires documentation and a minor version release to allow users time to adapt before removal in a major update. Practical challenges, such as accidental versioning errors or managing dependencies without altering the public API, are addressed through guidelines on corrective releases and evaluating the impact of changes.
While the system may seem rigid, it encourages thoughtful development, ensuring that incompatible changes are introduced only when necessary. By linking to the specification in project documentation, developers invite others to benefit from consistent practices, reducing the friction of dependency management. The approach avoids arbitrary versioning, prioritising transparency and predictability. It acknowledges the complexity of real-world software but provides a framework to navigate it systematically, ensuring that updates are both intentional and manageable.
16:23, 16th June 2022
In Python, two standard approaches exist for checking whether a file exists before performing operations on it. The first uses the exists() function from the os.path module, which accepts a file path as its argument and returns True if the file is found or False if it is not. The second uses the is_file() method from the Path class within the pathlib module, available since Python 3.4, which follows an object-oriented approach and behaves in the same way. When specifying file paths with either method, forward slashes should be used as separators, as this works consistently across Windows, macOS and Linux.
16:22, 16th June 2022
Python – List Files in a Directory
Python offers several built-in methods for retrieving lists of files and directories stored on a computer. The os module provides three key functions for this purpose: os.listdir(), which returns the contents of a specified directory without going deeper than the first level; os.walk(), which traverses an entire directory tree and is useful for locating specific file types across multiple folders; and os.scandir(), a more efficient alternative to os.listdir() that is available in Python 3.5 and above. For more flexible retrieval using pattern matching with wildcards, the glob module offers two options: glob.glob(), which returns a list of matching file paths, and glob.iglob(), which returns an iterator instead and is better suited to large directories due to its greater efficiency.
16:21, 16th June 2022
Create an empty array in Python
Python offers several approaches to initialising an empty array, with the best choice depending on the specific use case. A standard Python list, created using either square brackets or the list() constructor, is the most flexible and commonly used option, as it can hold any data type and grows dynamically. For memory-efficient storage of uniformly typed data, the array module provides a typed alternative that requires all elements to share the same data type.
In data science and machine learning contexts, NumPy is generally the preferred choice due to its speed and mathematical capabilities, offering multiple initialisation methods including np.array(), np.zeros() and np.empty(), the last of which is the fastest as it allocates memory without setting initial values. A key distinction to bear in mind is that standard Python lists are dynamic whilst NumPy arrays are intended to remain static in size, meaning that frequently resizing a NumPy array is a signal that a list may be the more appropriate starting point, with conversion to NumPy carried out only once the data collection is complete.
14:35, 15th June 2022
Pandas Convert List of Dictionaries to DataFrame
In Python, a dictionary (dict) holds key-value pairs where keys serve as column names and values populate the corresponding column data when converted to a Pandas DataFrame. Several methods can be used to perform this conversion, including pd.DataFrame(), pd.DataFrame.from_records(), pd.DataFrame.from_dict() and json_normalize().
When dictionaries share inconsistent keys, Pandas automatically inserts NaN values for any missing entries. Custom indexing can be applied during conversion using the index parameter, while the columns parameter allows control over column order and selection. For large datasets, from_records() is generally the better-performing option, and for dictionaries containing nested structures, json_normalize() flattens the data into a suitable tabular format.