Getting text inside HTML tag in local file using grep

Possible duplicate:
Open RegEx tags, except standalone XHTML tags

Excerpt from the input file

<TD class="clsTDLabelWeb" width="28%">Municipality:&nbsp;</TD> <TD style="WIDTH: 394px" class="clsTDLabelSm" colSpan="5"> <span id="DInfo1_Municipality">JUPITER</span></TD> 

My regex

 (?<=<span id="DInfo1_Municipality">)([^</span>]*) 

I have an HTML file saved on disk. I would like to use grep to search the file and display the contents of a certain range , although I do not know if this use of grep is correct. When I run grep in a file with an expression read from another file (so that I don’t get confused by avoiding any special characters), it does not output anything. I tested the expression in RegExr and it matches "JUPITER", which is exactly what I want to return. Thank you so much for your help!

Desired output

 JUPITER 
+6
html bash regex grep screen-scraping
source share
3 answers

Try:

 sed -n 's|^<span id="DInfo1_Municipality">\([^<]*\)</span></TD>$|\1|p' file 

or with GNU grep and your regex:

 grep -Po '(?<=<span id="DInfo1_Municipality">)([^</span>]*)' 
+3
source share

Grep does not support this type of regular expressions (lookbehind assertions), and its a very poor tool for this, but in the above example it is working, it breaks in many situations.

 grep -io "<span id=\"DInfo1_Municipality\">.*</span>" file.htlm | grep -io ">[^<]*" | grep -io [^>]* 

something so crazy, not a good idea.

+1
source share
 sed -n '/DInfo1_Municipality/s/<\/span.*//p' file | sed 's/.*>//' 
+1
source share

All Articles