Take the nth occurrence between two patterns using awk or sed

Question

Take the nth occurrence between two patterns using awk or sed

I have a problem when I want to parse the result from a file and I want to grab the nth occurrence of text between two patterns, preferably using awk or sed

category 1 s t done category 2 n d done category 3 r d done category 4 t h done

Let me just say in this example that I want to capture the third occurrence of the text between the category and done, essentially the output will be

 category 3 r d done

+7

shell awk sed

Dan lawless Nov 08 '12 at 2:27

source share

5 answers

Try to do this:

  awk -vn=3 '/^category/{l++} (l==n){print}' file.txt

Or more mysterious:

 awk -vn=3 '/^category/{l++} l==n' file.txt

If your file is large:

 awk -vn=3 '/^category/{l++} l>n{exit} l==n' file.txt

+5

Gilles quenot Nov 08 '12 at 2:30

source share

If your file does not contain any null characters, here using GNU sed . This will detect the third occurrence of a range of patterns. However, you can easily change this to get any event that you would like.

 sed -n '/^category/ { x; s/^/\x0/; /^\x0\{3\}$/ { x; :a; p; /done/q; n; ba }; x }' file.txt

Results:

 category 3 r d done

Explanation:

Disable default printing using the -n switch. Match the word category at the beginning of the line. Change the template space to hold space and add a null character to the top of the template. In the example, if the pattern then contains two leading null characters, pull the pattern out of the spaces. Now create a loop and print the contents of the pattern space until the last pattern is matched. When this last pattern is found, sed will exit. If it is not found, sed will continue to read the next line of input and continue in its loop.

+1

Steve Nov 08 '12 at 4:00

source share

 awk -v tgt=3 ' /^category$/ { fnd=1; rec="" } fnd { rec = rec $0 ORS if (/^done$/) { if (++cnt == tgt) { printf "%s",rec exit } fnd = 0 } } ' file

+1

Ed morton Nov 08 '12 at 13:20

source share

With GNU awk, you can set the record separator to a regular expression:

 <file awk 'NR==n+1 { print rt, $0 } { rt = RT }' RS='\\<category' ORS='' n=3

Output:

 category 3 r d done

RT is a record separator delimiter. Note that a record with respect to n will be disabled by one, since the first record refers to what precedes the first RS .

Edit

According to Ed's comment, this will not work if the records have other data between them, for example:

 category 1 s t done category 2 n d done foo category 3 r d done bar category 4 t h done

One way around this is to clear the input with the second (or first) awk:

 <file awk '/^category$/,/^done$/' | awk 'NR==n+1 { print rt, $0 } { rt = RT }' RS='\\<category' ORS='' n=3

Output:

 category 3 r d done

Edit 2

As Ed noted in the comments, the above methods are not looking for the final template. One way to do this that other answers have not been addressed to is getline (note that there are some caveats with awk GetLine):

 <file awk ' /^category$/ { v = $0 while(!/^done$/) { if(!getline) exit v = v ORS $0 } if(++nr == n) print v }' n=3

In one line:

 <file awk '/^category$/ { v = $0; while(!/^done$/) { if(!getline) exit; v = v ORS $0 } if(++nr == n) print v }' n=3

0

Thor Nov 08 '12 at 8:30

source share

potong · Accepted Answer · 2012-11-08T07:10:03+0000

This may work for you (GNU sed):

 sed '/^category/{x;s/^/X/;/^X\{3\}$/ba;x};d;:a;x;:b;$!{n;/^done/!bb}' file

As an alternative:

 sed -nr '/^category/H;//,/^done/G;s/\n(\n[^\n]*){3}$//p' file

Or if you prefer awk:

 awk '/^category/,/^done/{if(++m==1)n++;if(n==3)print;if(/^done/)m=0}' file

Take the nth occurrence between two patterns using awk or sed

Edit

Edit 2

More articles: