Getting text inside HTML tag in local file using grep

Question

Getting text inside HTML tag in local file using grep

Possible duplicate:
Open RegEx tags, except standalone XHTML tags

Excerpt from the input file

<TD class="clsTDLabelWeb" width="28%">Municipality:&nbsp;</TD> <TD style="WIDTH: 394px" class="clsTDLabelSm" colSpan="5"> <span id="DInfo1_Municipality">JUPITER</span></TD>

My regex

 (?<=<span id="DInfo1_Municipality">)([^</span>]*)

I have an HTML file saved on disk. I would like to use grep to search the file and display the contents of a certain range , although I do not know if this use of grep is correct. When I run grep in a file with an expression read from another file (so that I don’t get confused by avoiding any special characters), it does not output anything. I tested the expression in RegExr and it matches "JUPITER", which is exactly what I want to return. Thank you so much for your help!

Desired output

 JUPITER

+6

html bash regex grep screen-scraping

Cody jackson Aug 29 '10 at 1:01

source share

3 answers

Dennis williamson · Answer 1 · 2010-08-29T05:12:44+0000

Try:

 sed -n 's|^<span id="DInfo1_Municipality">\([^<]*\)</span></TD>$|\1|p' file

or with GNU grep and your regex:

 grep -Po '(?<=<span id="DInfo1_Municipality">)([^</span>]*)'

Paul creasey · Answer 2 · 2010-08-29T01:10:02+0000

Grep does not support this type of regular expressions (lookbehind assertions), and its a very poor tool for this, but in the above example it is working, it breaks in many situations.

 grep -io "<span id=\"DInfo1_Municipality\">.*</span>" file.htlm | grep -io ">[^<]*" | grep -io [^>]*

something so crazy, not a good idea.

ghostdog74 · Answer 3 · 2010-08-29T02:43:43+0000

 sed -n '/DInfo1_Municipality/s/<\/span.*//p' file | sed 's/.*>//'

Getting text inside HTML tag in local file using grep

More articles: