How can I extract a tag attribute value from an HTML file?

I know don't analyze using curl, grep and sed. But I'm looking for an easy approach, not very safe.

So, I get an HTML file with curl, from which I need the value of a specific attribute from the tag. I use grep to get the line where it says token . This happens only once. This gives me a whole div:

 <div class="userlinks"> <span class="arrow flleft profilesettings">settings</span> <form class="logoutform" method="post" action="/logout"> <input class="logoutbtn arrow flright" type="submit" value="Log out"> <input type="hidden" name="ltoken" value="a5fc8828a42277538f1352cf9ea27a71"> </form> </div> 

How can I get only the value attribute (for example, "a5fc8828a42277538f1352cf9ea27a71")?

+4
source share
5 answers

No grep needed:

 sed -n '/token/s/.*name="ltoken"\s\+value="\([^"]\+\).*/\1/p' input_file 
+10
source

One way using sed :

 sed "s/.* value=\"\(.*\)\".*/\1/" file.txt 

Results:

 a5fc8828a42277538f1352cf9ea27a71 

NTN

+8
source

Use an XPath Expression and Grep Trait

You can correctly parse HTML from the command line. For example, you can use xgrep to create an xpath expression, and then use GNU sed (or your grep choice) to extract only the text you care about. For instance:

 $ xgrep -x '//input[@name="ltoken"][1]/@value' /tmp/foo | sed -rn '/value/ s/.*"([[:xdigit:]]+)"/\1/p' a5fc8828a42277538f1352cf9ea27a71 
+2
source

Another way: awk

 grep "ltoken" file.txt | awk -F"\"" '{print $6}' 

For a different attribute value, simply increase or decrease the value of $6

+2
source

There is one problem with the xgrep solution is that it expects valid xml. The provided html is invalid due to unclosed input elements. xmllint has an html parser parameter, and also provides a string () function to retrieve a value without using sed.

 $ xmllint --html --xpath 'string(//input[@name="ltoken"][1]/@value)' foo a5fc8828a42277538f1352cf9ea27a71 
0
source

All Articles