Parallel file grep pattern

Question

Parallel file grep pattern

I am successfully looking for this command: find the list of suspicious IP addresses from the txt file ips.txtin the log directory (compressed files).

root@yop# find /mylogs/ -exec zgrep -i -f ips.txt {} \; > ips.result.txt

Now I want to use parallel to speed up the search. I cannot find the right arguments for this at the moment. I mean using the template file (one per line), as well as exporting it to the result file.

Do you have a parallel guru?

The closer the command, the more I find the following: grep-or-anything-else-many-files-with-multiprocessor-power

But he could not use it with a template list file and export the results to a file too ...

Please help, thanks everyone.

+4

bash parallel-processing grep gnu-parallel

mastarah Feb 25 '14 at 11:06

source share

2 answers

Steve · Answer 1 · 2014-02-25T12:56:45+0000

If you just want to run several jobs at once, consider using GNU parallel :

parallel zgrep -i -f ips.txt :::: <(find /mylogs -type f) > results.txt

Josh jolly · Answer 2 · 2014-02-25T12:15:43+0000

How about iterating over files and then putting each file in a background job? As Mark noted, this may not be acceptable if you have a very large number of log files. It is also assumed that you are not using anything else in focus.

mkdir results

for f in "$(find /mylogs/)"; do 
    (zgrep -i -f ips.txt "$f" >> results/"$f".result &); 
done

wait

cat results/* > ip.results.txt
rm -rf results

You can limit the number of files to search using head and / or tail , for example, only search the first 50 files:

for f in "$(find /mylogs/ | head -50)"; do...

Then the following 50:

for f in "$(find /mylogs/ | head -100 | tail -50)"; do...

Etc.

Parallel file grep pattern

More articles: