Project management software | Technology Tales

Version control of large files on GitHub

27^th August 2024

When you try pushing large files to a GitHub repository, you may find that you breach its 100 MB limit. When you do, you either need to buy a data pack or exclude the file from being tracked. In my case, I decided that the monthly fee for 50 GB was not overly onerous, so I added that. Excluding such files using the .gitignore functionality makes a lot of sense, too.

If you decide to proceed as I did, you will need to install git-lfs. Since that may vary by operating system, I am leaving to you to look for those details on the website that I have linked to earlier. Activating it for your user account needs the following:

git lfs install

Following that, you need to flag the file or type of file using a command like the following:

git lfs track "[file path with name or search pattern]"

Executing the above adds the file path including the file name or the search pattern (normal operating system wildcards like * work here) to a file named .gitattributes in the root of the repository folder hierarchy. If that file no longer exists, it will get created the first time that this is done. It will also need to be added to the repository using git add like any other file. A general command like the following will also do it anyway, since it covers everything in the relevant folder:

git add .

After making a commit, the next step is to push the contents into GitHub. At this stage, the large file or files will be recognised and sent to large file storage with only a text link in the main area. Everything else will be handled as normal.

While on this subject, I need to add a few words of warning. Pushing a large file to GitHub without doing things up front will cause the operation to fail. That may make the transition over to large file storage all the more tricky, since things will be out of order. Moving everything to a temporary folder and again cloning the repository was how I got out of this impasse when it happened to me. Then, I could get the large file handling set up before getting going again. It is better to sort things like this out at the start of the process, rather than attempting to remedy things part way through the process.

Keeping a file or directory out of a Git or GitHub repository

26^th August 2024

Recently, I have begun to do more version control of files with Git and GitHub. However, GitHub is not a place to keep files with log in credentials. Thus, I wanted to keep these locally but avoid having them being tracked in either Git or GitHub.

Adding the names to a .gitignore file will avoid their inclusion prospectively, but what can you do if they get added in error before you do? The answer that I found is to execute a command like the following:

git rm -r --cached [path to file or directory with its name]

That takes it out of the staging area and allows the .gitignore functionality to do its job. The -r switch makes the command recursive, should you be working with the contents of a directory. Then, the --cached flag is what does the removal from the staging area.

While the aforementioned worked for me when I had an oversight, the following is also suggested:

git update-index --assume-unchanged [path to file or directory with its name]

That may be working without a .gitignore file, which was not how I was doing things. Nevertheless, it may have its uses for someone else, so that is why I include it above.