How to find two words in one line?

How do I grep for strings that contain two input words in a string? I am looking for strings that contain both words, how can I do this? I tried the pipe like this:

grep -c "word1" | grep -r "word2" logs 

It just gets stuck after the first pipe command.

What for?

+99
grep
Jun 25 2018-11-11T00:
source share
8 answers

Why are you going through -c ? It will just show the number of matches. Similarly, there is no reason to use -r . I suggest you read man grep .

In grep, for two words that exist on the same line, simply do:

 grep "word1" FILE | grep "word2" 

grep "word1" FILE will print all lines that contain word1 from FILE, and then grep "word2" will print lines that contain word2. Therefore, if you combine them using a pipe, it will display strings containing both word1 and word2.

If you just need to calculate how many lines there are 2 words in one line, follow these steps:

 grep "word1" FILE | grep -c "word2" 

Also, to answer the question why it was stuck: in grep -c "word1" you did not specify a file. So grep expects input from stdin , so it seems to hang. You can press Ctrl + D to send EOF (end of file) so that it leaves.

+140
Jun 25 2018-11-21T00:
source share

Prescription

One simple rewrite command in question:

 grep "word1" logs | grep "word2" 

The first grep finds lines with "word1" from the "logs" file, and then passes them to the second grep , which looks for lines containing "word2".

However, there is no need to use two commands. You can use extended grep ( grep -E or egrep ):

 grep -E 'word1.*word2|word2.*word1' logs 

If you know that the word "word1" will precede "word2" on the line, you don't even need alternatives, and regular grep will do:

 grep 'word1.*word2' logs 

Variants of the "one command" have the advantage that only one process is running, so lines containing "word1" should not be piped to the second process. How important this is depends on how big the data file is and how many lines match "word1". If the file is small, performance is unlikely to be a problem, and the execution of two commands will be fine. If the file is large, but only a few lines contain the word "word1", there will not be much data on the pipe, and the use of two commands is fine. However, if the file is huge, and the word "word1" is common, you can transfer important data over the channel, where one command avoids this overhead. In contrast, regex is more complex; you may need to compare it to find out which is best, but only if performance really matters. If you run two commands, you should strive to select a less frequently occurring word in the first grep in order to minimize the amount of data processed by the second.

Diagnostics

Initial script:

 grep -c "word1" | grep -r "word2" logs 

This is an odd sequence of commands. The first grep will count the number of occurrences of the word "1" on its standard input and print this number on its standard output. Until you specify EOF (for example, by typing Control-D ), it will sit there, waiting for you to type something. The second grep performs a recursive search for "word2" in the files under the logs directory (or, if it is a file, in the logs file). Or, in my case, this will not work, because there is neither a file nor a directory called logs where I run the pipeline. Please note that the second grep does not read its standard input at all, so the channel is superfluous.

With Bash, the parent shell waits until all processes in the pipeline exit, so it sits waiting for grep -c finish, which it won’t do until you specify EOF. Therefore, your code seems to be stuck. With Heirloom Shell, the second grep terminates and terminates, and the shell requests again. Now you have two processes: the first grep and the shell, and they both try to read from the keyboard, and it is not determined which one gets any input line (or any specified EOF indication).

Note that even if you entered the data as input for the first grep , you would only get lines containing "word2" shown in the output.




Footnote:

At one point, the answer is used:

 grep -E 'word1.*word2|word2.*word1' "$@" grep 'word1.*word2' "$@" 

This triggered the comments below.

+57
Jun 26 2018-11-11T00:
source share

you can use awk. like this...

 cat <yourFile> | awk '/word1/ && /word2/' 

Order is not important. Therefore, if you have a file and ...

A file named file1 contains:

 word1 is in this file as well as word2 word2 is in this file as well as word1 word4 is in this file as well as word1 word5 is in this file as well as word2 

then

 /tmp$ cat file1| awk '/word1/ && /word2/' 

will result in

 word1 is in this file as well as word2 word2 is in this file as well as word1 

yes, awk is slower.

+8
Jun 03 '14 at 13:21
source share

The main problem is that you did not put the first grep with any input. You will need to reorder your team somehow like

 grep "word1" logs | grep "word2" 

If you want to count the occurrences, then put '-c' on the second grep.

+7
Nov 26
source share

Try the cat using the command below

 cat log|grep -e word1 -e word2 
+5
Aug 28 '13 at 8:41
source share

grep word1 file_name | grep word2

it seems the easiest way for me

+1
Apr 21 '15 at 8:26
source share

Use grep:

 grep -wE "string1|String2|...." file_name 

Or you can use:

 echo string | grep -wE "string1|String2|...." 
0
Feb 24 '15 at 6:47
source share

git grep

Here is the syntax using git grep combining multiple patterns using logical expressions:

 git grep -e pattern1 --and -e pattern2 --and -e pattern3 

The above command will print lines matching all patterns at once.

If the files are not versioned, add the --no-index option.

Searches for files in the current directory that is not managed by Git.

Check out man git-grep for help.

See also:

-one
Dec 22 '16 at 16:41
source share



All Articles