Grep - how to display a progress bar or status

Sometimes I grep thousands of files, and it would be nice to see some kind of progress (bar or status).

I know this is not trivial because grep outputs the search results to STDOUT , and my default workflow is that I output the results to a file and would like the progress bar / status to be displayed before STDOUT or STDERR .

Do you need to change the grep source code?

The perfect team:

grep -e "STRING" --results="FILE.txt"

and progress:

 [curr file being searched], number x/total number of files 

written to STDOUT or STDERR

+7
source share
5 answers

This does not necessarily require a modification of grep , although you could get a more accurate progress bar with such a modification.

If you are grepping "thousands of files" with a single grep call, most likely you are using the -r option to recursively structure directories. In this case, it is not even clear that grep knows how many files it will check, because I believe that it starts to examine files before it examines the entire directory structure. Studying the directory structure first will probably increase the overall scan time (and, indeed, there is always the cost of reporting progress, so many traditional Unix utilities do this.)

In any case, a simple but slightly inaccurate progress bar can be obtained by building a complete list of files to be scanned, and then loading them into grep batches of a certain size, maybe 100, or perhaps based on the total lot size. Small batches will allow you to get more accurate reports on the work done, but they also increase overhead, as they will require an additional start of the grep process, and the start time of the process may be longer than grepping a small file. A progress report will be updated for each batch of files, so you will want to choose the batch size that gave you regular updates without increasing too much overhead. Based on the batch size on the total file size (using, for example, stat to get the file size), make the progress report more accurate, but add the extra cost of starting the process.

One of the advantages of this strategy is that you can also run two or more greps in parallel, which can speed things up a bit.


In a broad sense, a simple script (which simply divides files by account rather than by size and which does not try to parallelize).

 # Requires bash 4 and Gnu grep shopt -s globstar files=(**) total=${#files[@]} for ((i=0; i<total; i+=100)); do echo $i/$total >>/dev/stderr grep -d skip -e "$pattern" "${files[@]:i:100}" >>results.txt done 

For simplicity, I use globstar ( ** ) to safely place all files in an array. If your bash version is too old, you can do this by going to the find output, but it is not very effective if you have many files. Unfortunately, I do not know how to write a globstar expression that matches files. ( **/ matches directories only.) Fortunately, GNU grep provides the -d skip , which silently skips directories. This means that the number of files will be a little inaccurate, as directories will be counted, but this probably doesn't matter much.

You probably want to make the progress report cleaner using some console codes. Above all, to get you started.

The easiest way to split this into different processes is to simply split the list into X different segments and run X different for loops, each with a different starting point. However, they probably will not all end at the same time, which is not optimal. The best solution is GNU. You can do something like this:

 find . -type f -print0 | parallel --progress -L 100 -m -j 4 grep -e "$pattern" > results.txt 

(Here, -L 100 indicates that up to 100 files should be specified for each grep instance, and -j 4 four parallel processes. I just pulled these numbers off the air, you probably want to adjust them.)

+7
source

I normally use something like this:

 grep | tee "FILE.txt" | cat -n | sed 's/^/match: /;s/$/ /' | tr '\n' '\r' 1>&2 

This is not ideal, since it only displays matches, and if they are long or vary in length, there are errors, but it should provide you with a general idea.

Or simple points:

 grep | tee "FILE.txt" | sed 's/.*//' | tr '\n' '.' 1>&2 
+1
source

Try a parallel program

 find * -name \*.[ch] | parallel -j5 --bar '(grep grep-string {})' > output-file 

Although I found it to be slower than simple

 find * -name \*.[ch] | xargs grep grep-string > output-file 
+1
source

I am sure you will need to change the grep source code. And these changes would be huge.

Currently, grep does not know how many lines a file has until it completes parsing the entire file. For your requirement, you will need to analyze the file 2 times, or at least determine the total number of lines in any other way.

For the first time, it will determine the number of rows for the progress bar. The second time, it will actually do a search on your template.

This will not only increase runtime, but also violate one of the basic philosophies of UNIX.

  • Do each program well. To do a new job, create a new one and not complicate the old programs by adding new β€œfeatures”. ( source )

There may be other tools for your need, but afaik grep will not fit here.

0
source

This command shows progress (speed and displacement), but not the total amount. However, this can be estimated manually.

 dd if=/input/file bs=1c skip=<offset> | pv | grep -aob "<string>" 
0
source

All Articles