Sed regexp multiline - replace HTML

I am trying to replace multiple lines with sed on a Linux system

Here is my file

<!-- PAGE TAG --> DATA1 DATA2 DATA3 DATA4 DATA5 DATA6 <div id="DATA"></div> DATA8 DATA9 <!-- PAGE TAG --> 

The attempts that I made and failed!

 sed -n '1h;1!H;${;g;s/<!-- PAGE TAG -->.*<!-- PAGE TAG -->//g;p;}' sed -n '1!N; s/<!-- PAGE TAG -->.*<!-- PAGE TAG -->// p' sed -i 's|<!--[^>]*-->[^+]+<!--[^>]*-->||g' sed -i 's|/\/\/<!-- PAGE TA -->/,/\/\/<!-- PAGE TA -->||g' 

Anything between <!-- PAGE TAG --> should be replaced.

This question is similar to sed multiline replace

+4
source share
2 answers

While @nhahtdh's answer is correct for your original question, this solution is the answer to your comments:

 sed ' /<!-- PAGE TAG -->/,/<!-- PAGE TAG -->/ { 1 { s/^.*$/Replace Data/ b } d } ' 

You can read it like this:

/<!-- PAGE TAG -->/,/<!-- PAGE TAG -->/ → for lines between these regular expressions

1 { → for the first line of correspondence

s/^.*$/Replace Data/ → find something and replace it with Replace Data

b → from branch to end (behaves like a gap in this case)

d → otherwise delete the line

You can make any series of sed commands single-line using gnu sed by adding semicolons after each command (but not recommended if you want to read it later):

 sed '/<!-- PAGE TAG -->/,/<!-- PAGE TAG -->/ { 1 { s/^.*$/Replace Data/; b; }; d; };' 

As a note, you should try to be as specific as possible in your publication. "replaced / deleted" means "replaced OR deleted." If you want it to be replaced, just say that it is replaced. This helps both those of us trying to answer your question and future users who may be experiencing the same problem.

+3
source

Adapting from the answer provided in the link you see should work:

 sed '/<!-- PAGE TAG -->/,/<!-- PAGE TAG -->/d' 

The regular expression format is [2addr]d , where 2 addresses /<!-- PAGE TAG -->/ and /<!-- PAGE TAG -->/ , separated by a comma. d means deleting all lines that look at the line that matches the first address in a line that matches the last address inclusively. (This means that objects outside the tag, but on the same line as the tag, will also be deleted).


Although Tim Pot answered the question, I'll just post it here in case anyone needs to replace a multi-line pattern:

 sed -n '1h; 1!H; ${g; s/<!-- PAGE TAG -->[^!]*<!-- PAGE TAG -->//g; p;}' 

I changed the solution from an existing source, so most of the team is explained here .

The regular expression here is a bit heterogeneous, since it is assumed that there is no character in the tags between the 2 page tags ! . Without this assumption, I cannot control the number of characters matched by the regular expression, since there is no lazy quantifier (as far as I know).

This solution will not remove the text before the tag, even if it is on the same line as the tag.

+4
source

All Articles