Remove html comment tags with regexp

This is how my text (html) file looks like <!-- | | | This is a dummy comment | | please delete me | | asap | | | ________________________________ | --> this is another line in this long dummy html file... please do not delete me 

I am trying to remove a comment using sed:

 cat file.html | sed 's/.*<!--\(.*\)-->.*//g' 

This does not work :( What am I doing wrong?

Many thanks for your help!

+4
source share
3 answers

Patrickmdnet has the correct answer. Here it is on the same line using extended regular expression:

 cat file.html | sed -e :a -re 's/<!--.*?-->//g;/<!--/N;//ba' 

Here is a good resource for more information on sed. This sed is a single layer adaptation of # 92

http://www.catonmat.net/blog/sed-one-liners-explained-part-three/

+9
source

One of the problems with your initial attempt is that your regex only processes comments that are completely single-line related. In addition, the leading and ending “. *” Will delete the text without comment.

You are better off using existing code instead of rolling around on your own.

http://sed.sourceforge.net/grabbag/scripts/strip_html_comments.sed

 #! /bin/sed -f # Delete HTML comments # ie everything between <!-- and --> # by Stewart Ravenhall < stewart.ravenhall@ukonline.co.uk > /<!--/!b :a /-->/!{ N ba } s/<!--.*-->// 

(from http://sed.sourceforge.net/grabbag/scripts/ )

See this link for different ways to use perl modules to remove HTML comments (using Regexp :: Common, HTML :: Parser or File :: Comments.) I'm sure there are methods that use other utilities.

http://www.perlmonks.org/?node_id=500603

+7
source

I think you can do it with awk if you want. Start:

 [~] $ more test.txt <!-- An HTML style comment --> Some other text <div> <p>blah</p> </div> <!-- Whoops Another comment --> <span>Something</span> 

awk result:

 [~]$ cat test.txt | awk '/<!--/ {off=1} /-->/ {off=2} /([\s\S]*)/ {if (off==0) print; if (off==2) off=0}' Some other text <div> <p>blah</p> </div> <span>Something</span> 
+3
source

All Articles