Take the nth occurrence between two patterns using awk or sed

I have a problem when I want to parse the result from a file and I want to grab the nth occurrence of text between two patterns, preferably using awk or sed

category 1 s t done category 2 n d done category 3 r d done category 4 t h done 

Let me just say in this example that I want to capture the third occurrence of the text between the category and done, essentially the output will be

 category 3 r d done 
+7
source share
5 answers

This may work for you (GNU sed):

 sed '/^category/{x;s/^/X/;/^X\{3\}$/ba;x};d;:a;x;:b;$!{n;/^done/!bb}' file 

As an alternative:

 sed -nr '/^category/H;//,/^done/G;s/\n(\n[^\n]*){3}$//p' file 

Or if you prefer awk:

 awk '/^category/,/^done/{if(++m==1)n++;if(n==3)print;if(/^done/)m=0}' file 
+6
source

Try to do this:

  awk -vn=3 '/^category/{l++} (l==n){print}' file.txt 

Or more mysterious:

 awk -vn=3 '/^category/{l++} l==n' file.txt 

If your file is large:

 awk -vn=3 '/^category/{l++} l>n{exit} l==n' file.txt 
+5
source

If your file does not contain any null characters, here using GNU sed . This will detect the third occurrence of a range of patterns. However, you can easily change this to get any event that you would like.

 sed -n '/^category/ { x; s/^/\x0/; /^\x0\{3\}$/ { x; :a; p; /done/q; n; ba }; x }' file.txt 

Results:

 category 3 r d done 

Explanation:

Disable default printing using the -n switch. Match the word category at the beginning of the line. Change the template space to hold space and add a null character to the top of the template. In the example, if the pattern then contains two leading null characters, pull the pattern out of the spaces. Now create a loop and print the contents of the pattern space until the last pattern is matched. When this last pattern is found, sed will exit. If it is not found, sed will continue to read the next line of input and continue in its loop.

+1
source
 awk -v tgt=3 ' /^category$/ { fnd=1; rec="" } fnd { rec = rec $0 ORS if (/^done$/) { if (++cnt == tgt) { printf "%s",rec exit } fnd = 0 } } ' file 
+1
source

With GNU awk, you can set the record separator to a regular expression:

 <file awk 'NR==n+1 { print rt, $0 } { rt = RT }' RS='\\<category' ORS='' n=3 

Output:

 category 3 r d done 

RT is a record separator delimiter. Note that a record with respect to n will be disabled by one, since the first record refers to what precedes the first RS .

Edit

According to Ed's comment, this will not work if the records have other data between them, for example:

 category 1 s t done category 2 n d done foo category 3 r d done bar category 4 t h done 

One way around this is to clear the input with the second (or first) awk:

 <file awk '/^category$/,/^done$/' | awk 'NR==n+1 { print rt, $0 } { rt = RT }' RS='\\<category' ORS='' n=3 

Output:

 category 3 r d done 

Edit 2

As Ed noted in the comments, the above methods are not looking for the final template. One way to do this that other answers have not been addressed to is getline (note that there are some caveats with awk GetLine):

 <file awk ' /^category$/ { v = $0 while(!/^done$/) { if(!getline) exit v = v ORS $0 } if(++nr == n) print v }' n=3 

In one line:

 <file awk '/^category$/ { v = $0; while(!/^done$/) { if(!getline) exit; v = v ORS $0 } if(++nr == n) print v }' n=3 
0
source

All Articles