How to map content between HTML tags with an attribute using grep?

Question

How to map content between HTML tags with an attribute using grep?

What regular expression should be used with the grep command if I want to combine the text contained in the <div class="Message"> tag with its closing </div> in the HTML file?

+8

regex grep

Albz Nov 26 '12 at 14:11

source share

3 answers

You can do this by specifying a regex:

 grep -E "^<div class=\"Message\">.*</div>$" input_files

Not that it only printed shells found on the same line. If your tag spans multiple lines, you can try:

 tr '\n' ' ' < input_file | grep -E "^<div class=\"Message\">.*</div>$"

+1

sampson-chen Nov 26 '12 at 14:15

source share

You cannot do this reliably with grep only. You need to parse HTML using an HTML parser.

What if the HTML code has something like:

 <!-- <div class="Message">blah blah</div> -->

You will get a false hit on this code with comments.

Consider using xmlgrep from the XML::Grep Perl module, as described here: Retrieve html file header using grep

+1

Andy lester Nov 26 '12 at 15:55

source share

Steve · Accepted Answer · 2012-11-26T14:32:11+0000

Here is one way: GNU grep :

 grep -oP '(?<=<div class="Message"> ).*?(?= </div>)' file

If your tags span multiple lines, try:

 < file tr -d '\n' | grep -oP '(?<=<div class="Message"> ).*?(?= </div>)'

How to map content between HTML tags with an attribute using grep?

More articles: