I have an HTML page created using an existing tool - I cannot change the output of this tool.
However, I want to use xmllint with the xmllint option to highlight several specific pieces of information from a loaded web page. The problem is that the page starts with:
<html lang=en><head>...
And xmllint produces errors almost immediately:
html.out:2: parser error : AttValue: " or ' expected <html lang=en><head> ^
Probably the problem is the lack of closing quotes around the value of the lang attribute. This whole page is full of this problem. (Although only sporadically.)
Almost every browser can parse this just fine - how can I convince xmllint to do this? I would like to avoid having to introduce an intermediate step to “fix” the file. Instead, I would like to:
1) Find a flag, check parameter, etc. that helps the parser, or:
2) Use another tool. (But what? xmllint always my move for XPath command line commands.)
Next, using only xpath , we get:
> xpath html.out '//myquery...' not well-formed (invalid token) at line 2, column 11, ...
html xml xpath xmllint
Craig otis
source share