Technology Tales

Adventures in consumer and enterprise technology

A more elegant way to read and combine data from multiple CSV files in Julia

Published on 24th October 2025 Estimated Reading Time: 2 minutes

When I was compiling financial information for my accountant recently, I needed to read in a number of CSV files and combine their contents to enable further processing. This was all in a Julia script, and there was a time when I would use an explicit loop to do this combination. However, I came across a better way to accomplish this that I am sharing here with you now. First, you need to define a list of files like this:

files = ["5-2024.csv", "6-2024.csv", "7-2024.csv", "9-2024.csv", "10-2024.csv", "11-2024.csv", "12-2024.csv", "1-2025.csv", "2-2025.csv", "3-2025.csv", "4-2025.csv"]

Where there are alternatives to the above, including globbing (using wildcards with a Julia package that works with these), I decided to keep things simple for myself. Now we come to the line that does all the heavy lifting:

df = vcat([CSV.read(dir * file, DataFrame, normalizenames = true, header = 5, skipto = 6; silencewarnings=true) for file in files]...)

Near the end, there is the list comprehension ([***** for file in files]) that avoids the need for an explicit loop that I have used a few times in the past. This loops through each file in the list defined at the top, reading it into a dataframe as per the DataFrame option. The normalizenames option replaces spaces with underscores and cleans up any invalid characters. The header and skipto options tell Julia where to find the column headings and where to start reading the file, respectively. Then, the silencewarnings option suppresses any warnings about missing columns or inconsistent rows; clearly a check on the data frame is needed to ensure that all is in order if you wish to go the same route as I did.

The splat (...) operator takes the resulting list of data frames and converts them into individual arguments passed to the vcat function that virtually concatenates them together to create the df data frame. Just like suppressing warnings about missing columns or inconsistent rows during CSV file read time, this involves trust in the input data that everything is structured alike. Naturally, you need to do your own checks to ensure that is the case, as it was for me with what I had to do.

Add a Comment

Your email address will not be published. Required fields are marked *

Please be aware that comment moderation is enabled and may delay the appearance of your contribution.

  • The content, images, and materials on this website are protected by copyright law and may not be reproduced, distributed, transmitted, displayed, or published in any form without the prior written permission of the copyright holder. All trademarks, logos, and brand names mentioned on this website are the property of their respective owners. Unauthorised use or duplication of these materials may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties.

  • All comments on this website are moderated and should contribute meaningfully to the discussion. We welcome diverse viewpoints expressed respectfully, but reserve the right to remove any comments containing hate speech, profanity, personal attacks, spam, promotional content or other inappropriate material without notice. Please note that comment moderation may take up to 24 hours, and that repeatedly violating these guidelines may result in being banned from future participation.

  • By submitting a comment, you grant us the right to publish and edit it as needed, whilst retaining your ownership of the content. Your email address will never be published or shared, though it is required for moderation purposes.