Quantcast
Channel: How can I find/identify large commits in Git history? - Stack Overflow
Viewing all articles
Browse latest Browse all 29

Answer by friederbluemle for How to find/identify large commits in git history?

$
0
0

Step 1 Write all file SHA1s to a text file:

git rev-list --objects --all | sort -k 2 > allfileshas.txt

Step 2 Sort the blobs from biggest to smallest and write results to text file:

git gc && git verify-pack -v .git/objects/pack/pack-*.idx | egrep "^\w+ blob\W+[0-9]+ [0-9]+ [0-9]+$" | sort -k 3 -n -r > bigobjects.txt

Step 3a Combine both text files to get file name/sha1/size information:

for SHA in `cut -f 1 -d\  < bigobjects.txt`; doecho $(grep $SHA bigobjects.txt) $(grep $SHA allfileshas.txt) | awk '{print $1,$3,$7}'>> bigtosmall.txtdone;

Step 3b If you have file names or path names containing spaces try this variation of Step 3a. It uses cut instead of awk to get the desired columns incl. spaces from column 7 to end of line:

for SHA in `cut -f 1 -d\  < bigobjects.txt`; doecho $(grep $SHA bigobjects.txt) $(grep $SHA allfileshas.txt) | cut -d '' -f'1,3,7-'>> bigtosmall.txtdone;

Now you can look at the file bigtosmall.txt in order to decide which files you want to remove from your Git history.

Step 4 To perform the removal (note this part is slow since it's going to examine every commit in your history for data about the file you identified):

git filter-branch --tree-filter 'rm -f myLargeFile.log' HEAD

Source

Steps 1-3a were copied from Finding and Purging Big Files From Git History

EDIT

The article was deleted sometime in the second half of 2017, but an archived copy of it can still be accessed using the Wayback Machine.


Viewing all articles
Browse latest Browse all 29

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>