Technology Tales

Notes drawn from experiences in consumer and enterprise technology

23:46, 3rd February 2023

When to Use a List Comprehension in Python

Python list comprehensions offer a concise and readable way to create and transform lists by combining a loop and optional conditional logic into a single line, serving as an alternative to traditional for loops and the map() function. They support filtering through conditional statements, can be extended to create sets and dictionaries, and the walrus operator (:=) introduced in Python 3.8 allows values to be assigned within a comprehension's conditional clause.

However, they are not always the best choice, as nested comprehensions can reduce code clarity and their eager loading of entire lists into memory makes them unsuitable for very large datasets, where generator expressions are more appropriate since they evaluate values lazily and maintain a smaller memory footprint. When performance is a priority, profiling tools such as the timeit library can help determine which approach, whether a list comprehension, map() or a standard loop, is most efficient for a given situation.

23:44, 3rd February 2023

How to return multiple values from a function in Python

Returning multiple values from a Python function can be achieved by separating them with commas in the return statement, which results in a tuple being returned. This approach allows the function to yield several values simultaneously, and the returned tuple can be unpacked into separate variables for individual use. Alternatively, a list can be returned by enclosing the values in square brackets, offering a different data structure for the output. Both methods enable the function to deliver multiple results efficiently, with the tuple method being particularly concise and commonly used in practice.

23:43, 3rd February 2023

Parallel For-Loop With a Multiprocessing Pool

Python's multiprocessing.Pool class allows sequential for loops to be converted into parallel operations that utilise all available CPU cores simultaneously. Before parallelising a loop, the code must be refactored so that each iteration calls a self-contained target function with its own arguments and no reliance on shared resources, which helps avoid concurrency issues such as race conditions.

The pool's map() function handles single-argument tasks and returns results once all tasks are complete, while starmap() serves the same purpose for functions requiring multiple arguments. For greater memory efficiency and responsiveness, imap() issues tasks one at a time as workers become available and yields results as each task finishes, rather than waiting for the entire batch to complete.

By default, the pool creates one worker process per logical CPU core, though this can be adjusted manually via the processes argument. The number of logical CPU cores on a system can be retrieved using either multiprocessing.cpu_count() or os.cpu_count(), with logical cores typically being twice the number of physical cores due to hyperthreading. The pool is best suited to computationally intensive tasks involving small amounts of data, whereas input/output-bound tasks are generally better handled using thread-based concurrency such as ThreadPoolExecutor.

23:42, 3rd February 2023

Pandas – Create DataFrame From Multiple Series

Creating a DataFrame from multiple Pandas Series involves using the concat() function to merge them as columns, aligning their index values and filling gaps with NaN where Series lengths differ. Each Series can be assigned a name to serve as a column header, and custom index labels can be applied to both Series and the resulting DataFrame. Techniques such as reset_index() allow for reorganising the DataFrame's structure, while additional Series can be appended to existing DataFrames by specifying new column names. This approach facilitates the construction of structured tabular data from individual Series, ensuring flexibility in data alignment and formatting.

23:41, 3rd February 2023

How to run command or code in parallel in bash shell under Linux or Unix

Running multiple commands or code simultaneously in a Bash shell on Linux and Unix-like systems can be achieved through several methods. The simplest approach involves appending an ampersand to a command to push it into the background, with the built-in wait command used to pause execution until all background processes have completed before continuing. This allows multiple processes to run concurrently within a script. A more powerful option is GNU parallel, a shell tool that executes jobs across one or more computers simultaneously, accepting input such as file lists, URLs or hostnames, and offering features like job limiting, SSH-based remote execution and filename replacement strings for batch operations such as image conversion or file compression. The xargs command also provides parallel execution capabilities. GNU parallel can be installed on Debian and Ubuntu systems via apt, on RHEL and CentOS via yum and on Fedora via dnf.

23:39, 3rd February 2023

How to Rename Columns in Pandas

Renaming columns in a Pandas DataFrame can be achieved through three primary approaches. The first method involves specifying individual column names and their new labels using the rename function, which allows selective updates without altering other column names. The second method replaces all column names simultaneously by directly assigning a new list of names to the DataFrame's columns attribute, which is particularly efficient when renaming most or all columns. The third method uses string manipulation to replace specific characters across all column names, such as removing a common prefix or suffix, which is useful for standardising naming conventions. Each technique offers flexibility depending on the scope and nature of the renaming task, ensuring that users can efficiently manage column labels in their datasets.

23:28, 3rd February 2023

Python: Iterate over multiple lists simultaneously

Python offers multiple approaches to iterate over multiple lists simultaneously, enabling the processing of related data in a synchronised manner. The zip() function pairs elements from each list, stopping when the shortest list ends, while itertools.zip_longest() continues iteration until all lists are exhausted, filling missing values with None or a specified fill value. Also, enumerate() allows tracking of indexes to access corresponding elements from other lists, and generator expressions can be used with zip() for efficient iteration over large datasets. Each method provides distinct advantages depending on the specific requirements of the task, such as handling varying list lengths or optimising memory usage.

17:55, 20th January 2023

Python: Check if a File or Directory Exists

Python offers two main modules for checking whether a file or directory exists on a system. The OS module, part of Python's standard utility modules, provides the os.path.exists() method to check whether a given path exists, os.path.isfile() to confirm whether a path points to a regular file and os.path.isdir() to determine whether a path refers to a directory, including those reached via symbolic links. The Pathlib module, also part of Python's standard utilities, offers an alternative approach through the pathlib.Path.exists() method, which achieves the same result by instantiating a Path object and calling exists() on it. Both modules are built into Python and provide reliable, portable ways to verify the presence of files and directories before performing further operations on them.

11:31, 20th January 2023

Lists in Groovy

Groovy extends Java's list handling by offering additional methods and shorthand syntax for creating, modifying and manipulating lists, including operations such as adding, updating and removing items, as well as filtering, sorting and transforming elements. Lists can be created using dynamic typing and literal syntax, with options to specify different implementations like LinkedList or ArrayList.

Retrieving items supports both positive and negative indexing, while methods like each(), find() and grep() simplify iteration and filtering. Sorting can be done using natural ordering or custom comparators and operations such as collecting, joining and checking uniqueness further enhance list manipulation. The language also provides tools for iterating through elements, applying conditions and generating new lists based on specific criteria, demonstrating Groovy's enhancements to the Java Collections API for more efficient and expressive data handling.

10:13, 12th January 2023

Open Graph protocol

The Open Graph protocol allows web pages to function as rich objects within social networks by using metadata tags in the HTML head, specifying properties such as title, type, image and canonical URL. These tags enable consistent representation across platforms, with optional elements for additional details like descriptions, locales and media. Structured properties provide extended information for images, videos and audio, while predefined object types serve categories like music, video, articles and books, each with specific attributes. The protocol draws on existing standards and is supported by various tools and libraries, facilitating integration and enhancing how content is shared and displayed online.

  • The content, images, and materials on this website are protected by copyright law and may not be reproduced, distributed, transmitted, displayed, or published in any form without the prior written permission of the copyright holder. All trademarks, logos, and brand names mentioned on this website are the property of their respective owners. Unauthorised use or duplication of these materials may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties.

  • All comments on this website are moderated and should contribute meaningfully to the discussion. We welcome diverse viewpoints expressed respectfully, but reserve the right to remove any comments containing hate speech, profanity, personal attacks, spam, promotional content or other inappropriate material without notice. Please note that comment moderation may take up to 24 hours, and that repeatedly violating these guidelines may result in being banned from future participation.

  • By submitting a comment, you grant us the right to publish and edit it as needed, whilst retaining your ownership of the content. Your email address will never be published or shared, though it is required for moderation purposes.