I noticed that using the -E or multiple -E options is faster than using -f . Note that this may not be applicable to your problem, as you are looking for 50,000 lines in a larger file. However, I wanted to show you what can be done and what might be worth checking out:
Here is what I noticed in detail:
Download a 1.2 GB file with random strings.
>ls -has | grep string 1,2G strings.txt >head strings.txt Mfzd0sf7RA664UVrBHK44cSQpLRKT6J0 Uk218A8GKRdAVOZLIykVc0b2RH1ayfAy BmuCCPJaQGhFTIutGpVG86tlanW8c9Pa etrulbGONKT3pact1SHg2ipcCr7TZ9jc .....
Now I want to look for the strings "ab", "cd" and "ef" using different grep approaches:
Using grep without flags, search one by one:
grep "ab" strings.txt> m1.out
2.76s user 0.42s system 96% cpu 3.313 total
grep "cd" strings.txt โ m1.out
2.82s user 0.36s system 95% cpu 3.322 total
grep "ef" strings.txt โ m1.out
2.78s user 0.36s system 94% cpu 3.360 total
Thus, the search takes almost 10 seconds .
Using grep with the -f flag with search strings in search.txt
>cat search.txt ab cd ef >grep -F -f search.txt strings.txt > m2.out 31,55s user 0,60s system 99% cpu 32,343 total
For some reason, it takes almost 32 seconds .
Now using multiple search patterns with -E
grep -E "ab|cd|ef" strings.txt > m3.out 3,80s user 0,36s system 98% cpu 4,220 total
or
grep --color=auto -e "ab" -e "cd" -e "ef" strings.txt > /dev/null 3,86s user 0,38s system 98% cpu 4,323 total
The third method using -E took 4.22 seconds to find the file.
Now let's check if the results match:
cat m1.out | sort | uniq > m1.sort cat m3.out | sort | uniq > m3.sort diff m1.sort m3.sort #
Diff makes no conclusion, which means the results are the same.
Perhaps we will try to try, otherwise I would advise you to look at the topic "The fastest grep possible", see the comment from Cyrus.
cb0
source share