Coding Notebook

18:12, 2^nd January 2022

The xcopy command is a Windows utility that copies files and directories, including subdirectories, and is compatible with Windows 10, Windows 11 and several versions of Windows Server. It offers a wide range of optional parameters that control how files are copied, including options to copy hidden or read-only files, verify copied files against their originals, copy only files modified after a specified date and retain file attributes such as ownership and access control settings.

It can operate across network connections in restartable mode, allowing interrupted transfers to resume once connectivity is restored, and supports network compression during transfers. Unlike the diskcopy command, xcopy does not require the source and destination storage to share the same format, making it more flexible for cross-format operations. It can also be used within batch programmes, with exit codes allowing scripts to handle errors such as insufficient memory, failed writes or user-initiated cancellations in a structured way.

18:11, 2^nd January 2022

XCOPY command: syntax and examples

Copying files using XCOPY commands involves specifying parameters to control source and destination paths, filter modifications and manage exclusions. The process requires careful configuration of switches such as /D for date-based filtering, /E for recursive directory copying and /EXCLUDE to omit specific files or folders. Challenges arise from date format discrepancies, exclusion pattern conflicts and the need to suppress prompts for confirmation during automated transfers. Users must test commands thoroughly to ensure accurate file selection and avoid unintended omissions or inclusions.

15:41, 25^th December 2021

DataFrames.jl mini-language explained

Mastering the transformation mini-language in DataFrames.jl involves understanding how to manipulate data through various styles of operations. This includes applying functions to grouped data, specifying source and target columns using the => syntax and handling special cases such as grouping keys.

The process requires familiarity with returning different data types like NamedTuples or AbstractMatrices, which influence how results are structured. Combining multiple transformation styles within a single operation allows for complex data manipulations, such as aggregating values, renaming columns and preserving or modifying grouping keys. The ability to integrate traditional function-based transformations with modern minilanguage syntax ensures flexibility in data processing tasks.

15:39, 25^th December 2021

Tips to create beautiful, publication-quality plots in Julia

Leandro Martínez outlines practical techniques for producing publication-quality plots using the Plots library with the GR backend. There are a range of features including multi-panel layouts, sequential colour schemes, overlapping bar charts with transparency, histogram bin-width matching across datasets of differing ranges, scatter plots with non-linear exponential fits and colour-consistent annotations, and the use of LaTeXStrings to render axis labels in sans-serif fonts. A user also can format tick labels in scientific notation using LaTeX styling by combining the Formatting and LaTeXStrings packages, allowing precise control over the visual appearance of numerical axes. Saving figures in PDF format preserves vector quality, while conversions to high-resolution bitmap formats such as TIFF or PNG via image editing software can be done when needed.

14:38, 25^th December 2021

One thousand and one stories in Julia

A blog post by Bogumił Kamiński, written to mark his 1,000th answer on Stack Overflow for the Julia programming language, explores the surprising and sometimes counterintuitive behaviour of Julia's sum function. Using practical examples, he demonstrates that sum does not simply use the standard addition operator but instead relies on an internal reduction operator called Base.add_sum, which promotes scalar integers to wider types but does not do so for vectors, leading to unexpected results such as integer overflow.

He further shows that sum uses reduce rather than foldl, meaning the order of operations is not guaranteed and can produce inconsistent outcomes depending on collection size and element types. Additional quirks arise when using the init keyword argument with floating-point numbers, where differing summation orders produce slightly different results, a known consequence of IEEE 754 floating-point arithmetic. His practical advice is to ensure collections contain elements of a consistent and appropriate type, and to treat floating-point results as approximations rather than exact values.

11:39, 25^th December 2021

DTable – an early performance assessment of a new distributed table implementation

Addressing the need for efficient out-of-core processing of large tabular datasets, a new distributed table implementation in Julia leverages existing tools such as Dagger and Tables.jl to enable scalable computation across multiple threads, workers and machines. Early performance evaluations highlight significant improvements over direct competitors like Dask in several benchmarks, particularly for reduction and shuffle operations, though challenges remain in optimising thread safety and reducing overhead for smaller datasets.

The project is actively developed as part of the Dagger.jl package, with current focus on refining algorithms and expanding compatibility with a broader range of data processing tasks. While initial results demonstrate potential for handling complex tabular workloads, further refinement is required to enhance stability, scalability and integration with Julia's ecosystem. Users are encouraged to explore the implementation through available documentation and contribute feedback to shape its future development.

18:08, 24^th December 2021

Working with dates and times in Julia involves using the Dates package to create, manipulate and format date and time objects, which includes functions for generating specific dates, extracting components like hours or minutes and converting between formats such as ISO8601 or Unix epoch time. The module allows users to define dates with varying levels of detail, apply formatting rules using specific character codes, handle time zones and utilise tools like strftime and unix2datetime for conversions, enabling precise control over temporal data in applications.

18:07, 24^th December 2021

Descriptive Statistics in Julia

Performing descriptive statistics in Julia involves leveraging built-in packages such as Distributions.jl, StatsBase.jl, CSV.jl, DataFrames.jl and StatsPlots.jl to analyse data characteristics, generate summaries and visualise results. The process includes installing and importing these packages, generating random data, calculating statistical measures like mean, median, variance and standard deviation, creating data frames for structured manipulation and using functions such as describe and summarystats to derive insights. Categorical variables can be analysed separately using the by function, while visualisations like density plots and box-and-whisker plots help illustrate distributions and relationships within the data.

17:54, 24^th December 2021

Choosing how to store your strings in Julia involves evaluating performance, memory usage and data characteristics, with recommendations varying based on whether strings are treated as categorical data, require pooled storage, or need inline allocation for efficiency. When dealing with large datasets, categorical data should be handled using CategoricalArrays.jl, while pooled storage for low-unique-value strings is suitable with PooledArrays.jl. InlineStrings.jl offers memory-efficient types for short, uniform strings and Symbol types may be used for comparison-heavy tasks with immutable labels, though they consume persistent memory. The decision hinges on balancing speed, memory constraints and the specific use case, with immediate conversion to appropriate formats upon data loading being emphasised for optimal outcomes.

17:38, 23^rd December 2021

Clip your data with ClipData.jl

The ClipData.jl package for Julia provides a straightforward way to transfer tabular data between a Julia session and the system clipboard in both directions. Data scientists who regularly work with tools such as Google Sheets can copy a table to their clipboard and ingest it directly into Julia as a DataFrame, or conversely export a DataFrame from Julia back to the clipboard for pasting elsewhere.

The package handles both tabular data with headers, using the cliptable function, and arrays without headers, using the cliparray function, with column element types detected automatically in the process. Additional features include options for controlling how table cells are parsed, with further details available on the package's homepage.

« Older Entries «

» Newer Entries »