How can I extract a tag attribute value from an HTML file?

Question

How can I extract a tag attribute value from an HTML file?

I know don't analyze using curl, grep and sed. But I'm looking for an easy approach, not very safe.

So, I get an HTML file with curl, from which I need the value of a specific attribute from the tag. I use grep to get the line where it says token . This happens only once. This gives me a whole div:

 <div class="userlinks"> <span class="arrow flleft profilesettings">settings</span> <form class="logoutform" method="post" action="/logout"> <input class="logoutbtn arrow flright" type="submit" value="Log out"> <input type="hidden" name="ltoken" value="a5fc8828a42277538f1352cf9ea27a71"> </form> </div>

How can I get only the value attribute (for example, "a5fc8828a42277538f1352cf9ea27a71")?

+4

bash regex

tzippy Jul 17 '12 at 13:43

source share

5 answers

One way using sed :

 sed "s/.* value=\"\(.*\)\".*/\1/" file.txt

Results:

 a5fc8828a42277538f1352cf9ea27a71

NTN

+8

Steve Jul 17 '12 at 13:48

source share

Use an XPath Expression and Grep Trait

You can correctly parse HTML from the command line. For example, you can use xgrep to create an xpath expression, and then use GNU sed (or your grep choice) to extract only the text you care about. For instance:

 $ xgrep -x '//input[@name="ltoken"][1]/@value' /tmp/foo | sed -rn '/value/ s/.*"([[:xdigit:]]+)"/\1/p' a5fc8828a42277538f1352cf9ea27a71

+2

Todd A. Jacobs Jul 17 '12 at 15:15

source share

Another way: awk

 grep "ltoken" file.txt | awk -F"\"" '{print $6}'

For a different attribute value, simply increase or decrease the value of $6

+2

Azi Jan 22 '15 at 2:41

source share

There is one problem with the xgrep solution is that it expects valid xml. The provided html is invalid due to unclosed input elements. xmllint has an html parser parameter, and also provides a string () function to retrieve a value without using sed.

 $ xmllint --html --xpath 'string(//input[@name="ltoken"][1]/@value)' foo a5fc8828a42277538f1352cf9ea27a71

0

Jasper krijgsman Jan 28 '14 at 12:13

source share

perreal · Accepted Answer · 2012-07-17T13:51:56+0000

No grep needed:

 sed -n '/token/s/.*name="ltoken"\s\+value="\([^"]\+\).*/\1/p' input_file

How can I extract a tag attribute value from an HTML file?

Use an XPath Expression and Grep Trait

More articles: