This does not necessarily require a modification of grep , although you could get a more accurate progress bar with such a modification.
If you are grepping "thousands of files" with a single grep call, most likely you are using the -r option to recursively structure directories. In this case, it is not even clear that grep knows how many files it will check, because I believe that it starts to examine files before it examines the entire directory structure. Studying the directory structure first will probably increase the overall scan time (and, indeed, there is always the cost of reporting progress, so many traditional Unix utilities do this.)
In any case, a simple but slightly inaccurate progress bar can be obtained by building a complete list of files to be scanned, and then loading them into grep batches of a certain size, maybe 100, or perhaps based on the total lot size. Small batches will allow you to get more accurate reports on the work done, but they also increase overhead, as they will require an additional start of the grep process, and the start time of the process may be longer than grepping a small file. A progress report will be updated for each batch of files, so you will want to choose the batch size that gave you regular updates without increasing too much overhead. Based on the batch size on the total file size (using, for example, stat to get the file size), make the progress report more accurate, but add the extra cost of starting the process.
One of the advantages of this strategy is that you can also run two or more greps in parallel, which can speed things up a bit.
In a broad sense, a simple script (which simply divides files by account rather than by size and which does not try to parallelize).
# Requires bash 4 and Gnu grep shopt -s globstar files=(**) total=${#files[@]} for ((i=0; i<total; i+=100)); do echo $i/$total >>/dev/stderr grep -d skip -e "$pattern" "${files[@]:i:100}" >>results.txt done
For simplicity, I use globstar ( ** ) to safely place all files in an array. If your bash version is too old, you can do this by going to the find output, but it is not very effective if you have many files. Unfortunately, I do not know how to write a globstar expression that matches files. ( **/ matches directories only.) Fortunately, GNU grep provides the -d skip , which silently skips directories. This means that the number of files will be a little inaccurate, as directories will be counted, but this probably doesn't matter much.
You probably want to make the progress report cleaner using some console codes. Above all, to get you started.
The easiest way to split this into different processes is to simply split the list into X different segments and run X different for loops, each with a different starting point. However, they probably will not all end at the same time, which is not optimal. The best solution is GNU. You can do something like this:
find . -type f -print0 | parallel --progress -L 100 -m -j 4 grep -e "$pattern" > results.txt
(Here, -L 100 indicates that up to 100 files should be specified for each grep instance, and -j 4 four parallel processes. I just pulled these numbers off the air, you probably want to adjust them.)