Julia | Technology Tales

A more elegant way to read and combine data from multiple CSV files in Julia

24^th October 2025

When I was compiling financial information for my accountant recently, I needed to read in a number of CSV files and combine their contents to enable further processing. This was all in a Julia script, and there was a time when I would use an explicit loop to do this combination. However, I came across a better way to accomplish this that I am sharing here with you now. First, you need to define a list of files like this:

files = ["5-2024.csv", "6-2024.csv", "7-2024.csv", "9-2024.csv", "10-2024.csv", "11-2024.csv", "12-2024.csv", "1-2025.csv", "2-2025.csv", "3-2025.csv", "4-2025.csv"]

Where there are alternatives to the above, including globbing (using wildcards with a Julia package that works with these), I decided to keep things simple for myself. Now we come to the line that does all the heavy lifting:

df = vcat([CSV.read(dir * file, DataFrame, normalizenames = true, header = 5, skipto = 6; silencewarnings=true) for file in files]...)

Near the end, there is the list comprehension ([***** for file in files]) that avoids the need for an explicit loop that I have used a few times in the past. This loops through each file in the list defined at the top, reading it into a dataframe as per the DataFrame option. The normalizenames option replaces spaces with underscores and cleans up any invalid characters. The header and skipto options tell Julia where to find the column headings and where to start reading the file, respectively. Then, the silencewarnings option suppresses any warnings about missing columns or inconsistent rows; clearly a check on the data frame is needed to ensure that all is in order if you wish to go the same route as I did.

The splat (...) operator takes the resulting list of data frames and converts them into individual arguments passed to the vcat function that virtually concatenates them together to create the df data frame. Just like suppressing warnings about missing columns or inconsistent rows during CSV file read time, this involves trust in the input data that everything is structured alike. Naturally, you need to do your own checks to ensure that is the case, as it was for me with what I had to do.

Resolving Python UnicodeEncodeError messages issued while executing scripts using the Windows Command Line

14^th March 2025

Recently, I got caught out by this message when summarising some text using Python and Open AI's API while working within VS Code:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 56-57: character maps to <undefined>

There was no problem on Linux or macOS, but it was triggered on the Windows command line from within VS Code. Unlike the Julia or R REPL's, everything in Python gets executed in the console like this:

& "C:/Program Files/Python313/python.exe" script.py

The Windows command line shell operated with cp1252 character encoding, and that was tripping up the code like the following:

with open("out.txt", "w") as file:
    file.write(new_text)

The cure was to specify the encoding of the output text as utf-8:

with open("out.txt", "w", encoding='utf-8') as file:
    file.write(new_text)

After that, all was well and text was written to a file like in the other operating systems. One other thing to note is that the use of backslashes in file paths is another gotcha. Adding an r before the quotes gets around this to escape the contents, like using double backslashes. Using forward slashes is another option.

with open(r"c:\temp\out.txt", "w", encoding='utf-8') as file:
    file.write(new_text)

Finding human balance in an age of AI code generation

12^th March 2025

Recently, I was asked about how I felt about AI. Given that the other person was not an enthusiast, I picked on something that happened to me, not so long ago. It involved both Perplexity and Google Gemini when I was trying to debug something: both produced too much code. The experience almost inspired a LinkedIn post, only for some of the thinking to go online here for now. A spot of brainstorming using an LLM sounds like a useful exercise.

Going back to the original question, it happened during a meeting about potential freelance work. Thus, I tapped into experiences with code generators over several decades. The first one involved a metadata-driven tool that I developed; users reported that there was too much imperfect code to debug with the added complexity that dealing with clinical study data brings. That challenge resurfaced with another bespoke tool that someone else developed, and I opted to make things simpler: produce some boilerplate code and let users take things from there. Later, someone else again decided to have another go, seemingly with more success.

It is even more challenging when you are insufficiently familiar with the code that is being produced. That happened to me with shell scripting code from Google Gemini that was peppered with some Awk code. There was no alternative but to learn a bit more about the language from Tutorials Point and seek out an online book elsewhere. That did get me up to speed, and I will return to these when I am in need again.

Then, there was the time when I was trying to get a Julia script to deal with Google Drive needing permissions to be set. This started Google Gemini into adding more and more error checking code with try catch blocks. Since I did not have the issue at that point, I opted to halt and wait for its recurrence. When it did, I opted for a simpler approach, especially with the gdrive CLI tool starting up a web server for completing the process of reactivation. While there are times when shell scripting is better than Julia for these things, I added extra robustness and user-friendliness anyway.

During that second task, I was using VS Code with the GitHub Copilot plugin. There is a need to be careful, yet that can save time when it adds suggestions for you to include or reject. The latter may apply when it adds conditional logic that needs more checking, while simple code outputting useful text to the console can be approved. While that certainly is how I approach things for now, it brings up an increasingly relevant question for me.

How do we deal with all this code production? In an environment with myriads of unit tests and a great deal of automation, there may be more capacity for handling the output than mere human inspection and review, which can overwhelm the limitations of a human context window. A quick search revealed that there are automated tools for just this purpose, possibly with their own learning curves; otherwise, manual working could be a better option in some cases.

After all, we need to do our own thinking too. That was brought home to me during the Julia script editing. To come up with a solution, I had to step away from LLM output and think creatively to come up with something simpler. There was a tension between the two needs during the exercise, which highlighted how important it is to learn not to be distracted by all the new technology. Being an introvert in the first place, I need that solo space, only to have to step away from technology to get that when it was a refuge in the first place.

For anyone with a programming hobby, they have to limit all this input to avoid being overwhelmed; learning a programming language could involve stripping out AI extensions from a code editor, for instance, LLM output has its place, yet it has to be at a human scale too. That perhaps is the genius of a chat interface, and we now have Agentic AI too. It is as if the technology curve never slackens, at least not until the current boom ends, possibly when things break because they go too far beyond us. All this acceleration is fine until we need to catch up with what is happening.

Clearing the Julia REPL during code development

23^rd September 2024

During development, there are times when you need to clear the Julia REPL. It can become so laden with content that it becomes hard to perform debugging of your code. One way to accomplish this is issuing the CTRL + L keyboard shortcut while focus is within the REPL; you need to click on it first. Another is to issue the following in the REPL itself:

print("\033c")

Here \033 is an escape code in octal format. It is often used in terminal control sequences. The c character is what resets the terminal to its initial state. Printing this sequence is what does the clearance, and variations can be used to clear other kinds of console screens too. That makes it a more generic solution.

Dropping to an underlying shell using the ; character is another possibility. Then, you can use the clear or cls commands as needed; the latter is for Windows systems.

One last option is to define a Julia function for doing this:

function clear_console()
    run(`clear`)  # or `cls` for Windows
end

Calling the clear_console function then clears the screen programmatically, allowing for greater automation. The run function is the one that sends that command in backticks to the underlying shell for execution. Even using that alone should work too.

Avoiding errors caused by missing Julia packages when running code on different computers

15^th September 2024

As part of an ongoing move to multi-location working, I am sharing scripts and other artefacts via GitHub. This includes Julia programs that I have. That has led me to realise that a bit of added automation would help iron out any package dependencies that arise. Setting up things as projects could help, yet that feels a little too much effort for what I have. Thus, I have gone for adding extra code to check on and install any missing packages instead of having failures.

For adding those extra packages, I instate the Pkg package as follows:

import Pkg

While it is a bit hackish, I then declare a single array that lists the packages to be checked:

pkglits =["HTTP", "JSON3", "DataFrames", "Dates", "XLSX"]

After that, there is a function that uses a try catch construct to find whether a package exists or not, using the inbuilt eval macro to try a using declaration:

tryusing(pkgsym) = try @eval using $pkgsym return true catch e return false end

The above function is called in a loop that both tests the existence of a package and, if missing, installs it:

for i in 1:length(pkglits) rslt = tryusing(Symbol(pkglits[i])) if rslt == false Pkg.add(pkglits[i]) end end

Once that has completed, using the following line to instate the packages required by later processing becomes error free, which is what I sought:

using HTTP, JSON3, DataFrames, Dates, XLSX

A way to survey hours of daylight for locations of interest

9^th September 2024

A few years back, I needed to get sunrise and sunset information for a location in Ireland. This was to help me plan visits to a rural location with a bus service going nearby, and I did not want to be waiting on the side of the road in the dark on my return journey. It ended up being a project that I undertook using the Julia programming language.

This had other uses too: one was the planning of trips to North America. This was how I learned that evenings in San Francisco were not as long as their counterparts in Ireland. Later, it had its uses in assessing the feasibility of seeing other parts of the Pacific Northwest during the month of August. Other matters meant that such designs never came to anything.

The Sunrise Sunset API was used to get the times for the start and end of daylight. That meant looping through the days of the year to get the information, but I needed to get the latitude and longitude information from elsewhere to fuel that process. While Google Maps has its uses with this, it is a manual and rather fiddly process. Sparing use of Nomintim's API is what helped with increasing the amount of automation and user-friendliness, especially what comes from OpenStreetMap.

Accessing using Julia's HTTP package got me the data in JSON format that I then converted into atomic vectors and tabular data. The end product is an Excel spreadsheet with all the times in UTC. A next step would be to use the solar noon information to port things to the correct timezone. It can be done manually in Excel and its kind, but some more automation would make things smoother.

Upgrading Julia packages

23^rd January 2024

Whenever a new version of Julia is released, I have certain actions to perform. With Julia 1.10, installing and updating it has become more automated thanks to shell scripting or the use of WINGET, depending on your operating system. Because my environment predates this, I found that the manual route still works best for me, and I will continue to do that.

Returning to what needs doing after an update, this includes updating Julia packages. In the REPL, this involves dropping to the PKG subshell using the ] key if you want to avoid using longer commands or filling your history with what is less important for everyday usage.

Previously, I often ran code to find a package was missing after updating Julia, so the add command was needed to reinstate it. That may raise its head again, but there also is the up command for upgrading all packages that were installed. This could be a time saver when only a single command is needed for all packages and not one command for each package as otherwise might be the case.

A look at the Julia programming language

19^th November 2022

Several open-source computing languages get mentioned when talking about working with data. Among these are R and Python, but there are others; Julia is another one of these. It took a while before I got to check out Julia because I felt the need to get acquainted with R and Python beforehand. There are others like Lua to investigate too, but that can wait for now.

With the way that R is making an incursion into clinical data reporting analysis following the passage of decades when SAS was predominant, my explorations of Julia are inspired by a certain contrariness on my part. Alongside some small personal projects, there has been some reading in (digital) book form and online. Concerning the latter of these, there are useful tutorials like Introduction to Data Science: Learn Julia Programming, Maths & Data Science from Scratch or Julia Programming: a Hands-on Tutorial. Like what happens with R, there are online versions of published books available free of charge, and they include Julia Data Science and Interactive Visualization and Plotting with Julia. Video learning can help too and Jane Herriman has recorded and shared useful beginner's guides on YouTube that start with the basics before heading onto more advanced subjects like multiple dispatch, broadcasting and metaprogramming.

This piece of learning has been made of simple self-inspired puzzles before moving on to anything more complex. That differs from my dalliance with R and Python, where I ventured into complexity first, not least because of testing them out with public COVID data. Eventually, I got around to doing that with Julia too, though my interest was beginning to wane by then, and Julia's abilities for creating multipage PDF files were such that the PDF Toolkit was needed to help with this. Along the way, I have made use of such packages as CSV.jl, DataFrames.jl, DataFramesMeta, Plots, Gadfly.jl, XLSX.jl and JSON3.jl, among others. After that, there is PrettyTables.jl to try out, and anyone can look at the Beautiful Makie website to see what Makie can do. There are plenty of other packages creating graphs, such as SpatialGraphs.jl, PGFPlotsX and GRUtils.jl. For formatting numbers, options include Format.jl and Humanize.jl.

So far, my primary usage has been with personal financial data together with automated processing and backup of photo files. The photo file processing has taken advantage of the ability to compile Julia scripts for added speed because just-in-time compilation always means there is a lag before the real work begins.

VS Code is my chosen editor for working with Julia scripts, since it has a plugin for the language. That adds the REPL, syntax highlighting, execution and data frame viewing capabilities that once were added to the now defunct Atom editor by its own plugin. While it would be nice to have a keyboard shortcut for script execution, the whole thing works well and is regularly updated.

Naturally, there have been a load of queries as I have gone along and the Julia Documentation has been consulted as well as Julia Discourse and Stack Overflow. The latter pair have become regular landing spots on many a Google search. One example followed a glitch that I encountered after a Julia upgrade when I asked a question about this and was directed to the XLSX.jl Migration Guides where I got the information that I needed to fix my code for it to run properly.

There is more learning to do as I continue to use Julia for various things. Once compiled, it does run fast like it has been promised. The syntax paradigm is akin to R and Python, but there are Julia-specific features too. If you have used the others, the learning curve is lessened but not eliminated completely. This is not an object-oriented language as such, but its functional nature makes it familiar enough for getting going with it. In short, the project has come a long way since it started more than ten years ago. There is much for the scientific programmer, but only time will tell if it usurped its older competitors. For now, I will remain interested in it.

Removing a Julia package using REPL or script commands

5^th October 2022

While I have been programming with SAS for a few decades, and it remains a linchpin in the world of clinical development in the pharmaceutical industry, other technologies like R and Python are gaining a foothold. Two years ago, I started to look at those languages with personal projects being a great way of facilitating this. In addition, I got to hear of Julia and got to try that too. That journey continues since I have put it into use for importing and backing up photos, and there are other possible uses too.

Recently, I updated Julia to version 1.8.2 but ran into a problem with the DataArrays package that I had installed, so I decided to remove it since it was added during experimentation. Though the Pkg package that is used for package management is documented, I had not got to that, which meant that some web searching ensued. It turns out that there are two ways of doing this. One uses the REPL: after pressing the ] key, the following command gets issued:

rm DataArrays

When all is done, pressing the delete or backspace keys returns things to normal. This also can be done in a script as well as the REPL, and the following line works in both instances:

using Pkg; Pkg.rm("DataArrays")

While the semicolon is used to separate two commands issued on the same line, they can be on different lines or issued separately just as well. Naturally, DataArrays is just an example here; you just replace that with the name of whatever other package you need to remove. Since we can get carried away when downloading packages, there are times when a clean-up is needed to remove redundant packages, so knowing how to remove any clutter is invaluable.

Accessing Julia REPL command history

4^th October 2022

In the BASH shell used on Linux and UNIX, the history command calls up a list of recent commands used and has many uses. There is a .bash_history file in the root of the user folder that logs and provides all this information, so there are times when you need to exclude some commands from there, but that is another story.

The Julia REPL environment works similarly to many operating system command line interfaces, so I wondered if there was a way to recall or refer to the history of commands issued. So far, I have not come across an equivalent to the BASH history command for the REPL itself, but there the command history is retained in a file like .bash_history. The location varies on different operating systems, though. On Linux, it is ~/.julia/logs/repl_history.jl while it is %USERPROFILE%\.julia\logs\repl_history.jl on Windows. While I tend to use scripts that I have written in VSCode rather than entering pieces of code in the REPL, the history retains its uses. Thus, I am sharing it here for others. In the past, the location changed, but these are the ones with Julia 1.8.2, the version that I have at the time of writing.

« Older Entries «