REGEX - matches the Nth word of a string containing a specific word

I am trying to make the correct REGEX to complete this task:

Match the Nth word of a string containing a specific word

For instance:

Input:

this is the first line - blue this is the second line - green this is the third line - red 

I want to match the word 7th of the lines containing the word " second "

Required Conclusion:

 green 

Does anyone know how to do this?

I use http://rubular.com/ to test REGEX.

I already tried this REGEX without success - it matches the next line

 (.*second.*)(?<data>.*?\s){7}(.*) 

--- UPDATED ---

Example 2

Input:

 this is the Foo line - blue this is the Bar line - green this is the Test line - red 

I want to match the word 4th lines containing the word " red "

Required Conclusion:

 Test 

In other words, the word I want to match can be obtained before or after the word that I use to select the line

+6
source share
2 answers

You can use this to match the line containing second , and grab the 7th word:

 ^(?=.*\bsecond\b)(?:\S+ ){6}(\S+) 

Make sure global and multi-line flags are active.

^ matches the beginning of a line.

(?=.*\bsecond\b) is a positive look to make sure the word second is in this line.

(?:\S+ ){6} matches 6 words.

(\S+) will get the 7th.

regex101 demo


You can apply the same principle to other requirements.

With a line containing red and getting the 4th word ...

 ^(?=.*\bred\b)(?:\S+ ){3}(\S+) 
+12
source

You requested a regex and you got a very good answer.

Sometimes you need to request a solution and not specify a tool.

Here is one airliner that, it seems to me, is best for you:

 awk '/second/ {print $7}' < inputFile.txt 

Explanation:

 /second/ - for any line that matches this regex (in this case, literal 'second') print $7 - print the 7th field (by default, fields are separated by space) 

I think this is much easier to understand than the regular expression, and it is more flexible for this kind of processing.

+3
source

All Articles