$ awk 'BEGIN{FS="|";IGNORECASE=1} $5 ~ "conclusions.*" $2 ".*" $3' my_file.txt 1|substance1|substance2|red|CONCLUSIONS: the effect of SUBSTANCE1 and SUBSTANCE2 in humans...|
How it works
BEGIN{FS="|";IGNORECASE=1}
This part does not change from the code in the question.
$5 ~ "conclusions.*" $2 ".*" $3
This condition: true if $5 matches the regular expression consisting of four sequences combined together: "conclusions.*" And $2 , and ".*" And $3 .
We did not indicate any action for this condition. Therefore, if the condition is true, awk performs the default action, which should print the line.
Simple examples
Consider:
$ echo "aa aa" | awk '$2 ~ /$1/'
This line does not print anything because awk does not replace the variables inside the regular expression.
Please note that no matches were found here:
$ echo '$1' | awk '$0 ~ /$1/'
There is no coincidence, because inside the regular expression $ matches only at the end of the line. Thus, /$1/ will only match the end of the line followed by 1 . If we want to get a match here, we need to avoid the dollar sign:
$ echo '$1' | awk '$0 ~ /\$1/' $1
To get a regex using awk variables, we can, like the basis for this answer, do the following:
$ echo "aa aa" | awk '$2 ~ $1' aa aa
It really gives a match.
Further improvement
As Ed Morton suggests in the comments, it can be important to insist that substances only correspond to whole words. In this case, we can use \\<...\\> to limit the correspondence of a substance to whole words. Thus:
awk 'BEGIN{FS="|";IGNORECASE=1} $5 ~ "conclusions.*\\<" $2 "\\>.*\\<" $3 "\\>"' my_file.txt
Thus, substance1 will not match substance10 .
source share