Step 1 Write all file SHA1s to a text file:
git rev-list --objects --all | sort -k 2 > allfileshas.txt
Step 2 Sort the blobs from biggest to smallest and write results to text file:
git gc && git verify-pack -v .git/objects/pack/pack-*.idx | egrep "^\w+ blob\W+[0-9]+ [0-9]+ [0-9]+$" | sort -k 3 -n -r > bigobjects.txt
Step 3a Combine both text files to get file name/sha1/size information:
for SHA in `cut -f 1 -d\ < bigobjects.txt`; doecho $(grep $SHA bigobjects.txt) $(grep $SHA allfileshas.txt) | awk '{print $1,$3,$7}'>> bigtosmall.txtdone;
Step 3b If you have file names or path names containing spaces try this variation of Step 3a. It uses cut
instead of awk
to get the desired columns incl. spaces from column 7 to end of line:
for SHA in `cut -f 1 -d\ < bigobjects.txt`; doecho $(grep $SHA bigobjects.txt) $(grep $SHA allfileshas.txt) | cut -d '' -f'1,3,7-'>> bigtosmall.txtdone;
Now you can look at the file bigtosmall.txt in order to decide which files you want to remove from your Git history.
Step 4 To perform the removal (note this part is slow since it's going to examine every commit in your history for data about the file you identified):
git filter-branch --tree-filter 'rm -f myLargeFile.log' HEAD
Source
Steps 1-3a were copied from Finding and Purging Big Files From Git History
EDIT
The article was deleted sometime in the second half of 2017, but an archived copy of it can still be accessed using the Wayback Machine.