I am parsing an HTML document with XPATH and I want to keep all internal html tags.
The specified html is an unordered list with many list items.
<ul id="adPoint1"><li>Business</li><li>Contract</li></ul>
I am parsing a document using the following PHP code
$dom = new DOMDocument(); @$dom->loadHTML($output); $this->xpath = new DOMXPath($dom); $testDom = $this->xpath->evaluate("//ul[@id='adPoint1']"); $test = $testDom->item(0)->nodeValue; echo htmlentities($test);
For some reason, the output always has html tags that have been excluded from it. I guess this is because XPATH was not intended to be used that way, but is it anyway around this?
I would really like to continue using XPATH, since I already use it to parse other areas of the page (individual href elements) without problems.
EDIT: I know there is a better way to get data iterating through UL children. There is a more complex part of the page that I also want to parse (a javascript block), but I'm trying to provide a more understandable example.
Actual block of code I want
<script language="javascript">document.write(rot_decode('<u7>Pbagnpg Qrgnvyf</u7><qy vq="pbagnpgQrgnvyf"><qg>Cu:</qg><qq>(58) 0078 8455</qq></qy>'));</script>
He has a problem that he skips all closing tags, but keeps opening tags. I assume XPATH is trying to parse internal elements, not just treat it as a string.
If I try to select a script element with
$testDom = $this->xpath->evaluate("//div[@id='businessDetails']/script"); $test = $testDom->item(0)->nodeValue; echo htmlentities($test);
my conclusion will be, which you can see, all closing tags are missing.
document.write(rot_decode('<u7>Pbagnpg Qrgnvyf<qy vq="pbagnpgQrgnvyf"><qg>Cu:<qq>(58) 0078 8455'));