Well, I would suggest that in the end this will be maximized, as the bit patterns will repeat, but I just did:
touch file gzip file -c > file.1 ... gzip file.9 -c > file.10
And received:
0 bytes: file 25 bytes: file.1 45 bytes: file.2 73 bytes: file.3 103 bytes: file.4 122 bytes: file.5 152 bytes: file.6 175 bytes: file.7 205 bytes: file.8 232 bytes: file.9 262 bytes: file.10
Here are 24,380 files graphically (this is really amazing for me, actually):
alt text http://research.engineering.wustl.edu/~schultzm/images/filesize.png
I did not expect such growth, I would just expect linear growth, since it should just encapsulate existing data in the header with the template dictionary. I intended to run over 1,000,000 files, but before that, my system ran out of disk space.
If you want to reproduce, here is a bash script for generating files:
#!/bin/bash touch file.0 for ((i=0; i < 20000; i++)); do gzip file.$i -c > file.$(($i+1)) done wc -c file.* | awk '{print $2 "\t" $1}' | sed 's/file.//' | sort -n > filesizes.txt
The resulting filesizes.txt file is a sorted tab delimited file for your favorite graphical display utility. (You will have to manually delete the "total" field or script it.)
mjschultz
source share