Sed recipe: how to make material between two patterns that can be either on the same line or on two lines?

Question

Sed recipe: how to make material between two patterns that can be either on the same line or on two lines?

Let's say we want to make some substitutions only between some patterns, let them be <a> and </a> for clarity ... (everything is fine, everything is fine, they are start and end !. Jeez!)

So, I know what to do if start and end always occur on the same line: just create the correct regular expression.

I also know what to do if they are guaranteed on different lines, and I don’t care about the line containing end , and I am fine with all the commands in the line containing start to start : just specify the address range as /start/,/end/ .

This, however, is not very useful. What should I do if I need to do smarter work, for example, make changes inside the {...} block?

One thing I can think of is to break the input into { and } before processing and put it together again:

 sed 's/{\|}/\n/g' input | sed 'main stuff' | sed ':a $!{N;ba}; s/\n\(}\|{\)\n/\1/g'

Another option is the opposite:

 cat input | tr '\n' '#' | sed 'whatever; s/#/\n/g'

Both of them are ugly, mainly because operations are not limited to one team. The second is even worse, because you need to use some character or substring as a "newline holder" if it is not present in the source text.

So the question is: are there any better ways or can the above mentioned be optimized? This is a fairly regular task from what I read in recent SO questions, so I would like to choose the best practice once and for all.

PS I am mainly interested in pure sed solutions: can work work with a single sed call and nothing more? Please, not awk , Perl , etc .: this is more of a theoretical question, rather than "needing to get the job done as soon as possible."

+3

sed

Lev levitsky Jun 13 '12 at 22:29

source share

1 answer

potong · Accepted Answer · 2012-06-15T09:36:31+0000

This might work for you:

 # create multiline test data cat <<\! >/tmp/a > this > this { this needs > changing to > that } that > that > ! sed '/{/!b;:a;/}/!{$q;N;ba};h;s/[^{]*{//;s/}.*//;s/this\|that/\U&/g;x;G;s/{[^}]*}\([^\n]*\)\n\(.*\)/{\2}\1/' /tmp/a this this { THIS needs changing to THAT } that that # convert multiline test data to a single line tr '\n' ' ' </tmp/a >/tmp/b sed '/{/!b;:a;/}/!{$q;N;ba};h;s/[^{]*{//;s/}.*//;s/this\|that/\U&/g;x;G;s/{[^}]*}\([^\n]*\)\n\(.*\)/{\2}\1/' /tmp/b this this { THIS needs changing to THAT } that that

Explanation:

Read the data in the template space (PS). /{/!b;:a;/}/!{$q;N;ba}
Copy data to hold space (HS). h
Discard non-data from the front and back of the row. s/[^{]*{//;s/}.*//
Data conversion, for example. s/this\|that/\U&/g
Switch to HS and add the converted data. x;G
Replace old data with converted data. s/{[^}]*}$[^\n]*$\n$.*$/{\2}\1/

EDIT:

A more complex answer, which I think serves more than one block per line.

 # slurp file into pattern space (PS) :a $! { N ba } # check for presence of \v if so quit with exit value 1 /\v/q1 # replace original newlines with \v's y/\n/\v/ # append a newline to PS as a delimiter G # copy PS to hold space (HS) h # starting from right to left delete everything but blocks :b s/\(.*\)\({.*}\).*\n/\1\n\2/ tb # delete any non-block details form the start of the file s/.*\n// # PS contains only block details # do any block processing here eg uppercase this and that s/th\(is\|at\)/\U&/g # append ps to hs H # swap to HS x # replace each original block with its processed one from right to left :c s/\(.*\){.*}\(.*\)\n\n\(.*\)\({.*}\)/\1\n\n\4\2\3/ tc # delete newlines s/\n//g # restore original newlines y/\v/\n/ # done!

NB This uses GNU-specific options, but can be modified to work with generic sed.

Sed recipe: how to make material between two patterns that can be either on the same line or on two lines?

More articles: