Compilation of the full word in the text of the text, given the list of words

Note:

Before I get down to business, I would like to point out some other SO posts that didn’t quite answer my question and were not duplicates of this:

Background:

I have a list of words in a file called words.txt(one word per line). I would like to find all lines from another much larger file with a name file.txtcontaining any of the words from words.txt. However, I need only whole words. This means that the match must be performed when the line from file.txtcontains at least one instance where the word from is words.txtfound to be "all separate" (I know this is undefined, so let me explain).

In other words, a match must be performed when:

  • The word itself on the line
  • Word surrounded by non-alphanumeric characters / without hyphen
  • The word is at the beginning of the line and is accompanied by a character other than alphanumeric or non-character.
  • The word is at the end of the line and is preceded by a character without an alphanumeric character

, words.txt cat, , :

cat              #=> match
cat cat cat      #=> match
the cat is gray  #=> match
mouse,cat,dog    #=> match
caterpillar cat  #=> match
caterpillar      #=> no match
concatenate      #=> no match
bobcat           #=> no match
catcat           #=> no match
cat100           #=> no match
cat-in-law       #=> no match

:

grep, . :

grep -wf words.txt file.txt

:

-w, --word-regexp
       Select only those lines containing matches that form whole words.
       The test is that the matching substring must either be at the beginning
       of the line, or preceded by a non-word constituent character.
       Similarly, it must be either at the end of the line or followed by a
       non-word constituent character. Word-constituent characters are
       letters, digits, and the underscore.
-f FILE, --file=FILE
       Obtain patterns from FILE, one per line. The empty file contains
       zero patterns, and therefore matches nothing.

, , , (.. -) " ". ( ), cat, cat-in-law, .

, -w, , . , (, cat) , , (, cat-in-law), .

, , words.txt, , :

grep -Ef words.txt file.txt

-E, --extended-regexp
              Interpret PATTERN as an extended regular expression

words.txt .

:

bash, ?

+4
1

:

grep -Ef <(awk '{print "([^a-zA-Z0-9-]|^)"$0"([^a-zA-Z0-9-]|$)"}' words.txt) file.txt

:

  • words.txt - ( ).
  • file.txt - , .
  • awk words.txt , , ( , ).
  • awk <( ), -f.
  • -E, words.txt.

, words.txt .

+4

All Articles