In DomDocument, reusing DOMXpath, is it stable?

I use the function below, but not sure if it is always stable / safe ... Is this?

When and who is stable / safe for "reusing parts of DOMXpath preparation procedures"?

To simplify the use of the XPath query () method, we can use a function that remembers the last calls with static variables,

function DOMXpath_reuser($file) { static $doc=NULL; static $docName=''; static $xp=NULL; if (!$doc) $doc = new DOMDocument(); if ($file!=$docName) { $doc->loadHTMLFile($file); $xp = NULL; } if (!$xp) $xp = new DOMXpath($doc); return $xp; // ??RETURNED VALUES ARE ALWAYS STABLE?? } 

This question is similar to this other about reusing XSLTProcessor. In both questions, the problem can be generalized to any language or structure that uses LibXML2 as its implementation of DomDocument.

There is another related question: How to update the "DOMDocument instances of LibXML2?"


Illustrating

Reuse is very useful (examples):

  $f = "my_XML_file.xml"; $elements = DOMXpath_reuser($f)->query("//*[@id]"); // use elements to get information $elements = DOMXpath_reuser($f)->("/html/body/div[1]"); // use elements to get information 

But if you do something like removeChild , replaceChild , etc. (example),

  $div = DOMXpath_reuser($f)->query("/html/body/div[1]")->item(0); //STABLE $div->parentNode->removeChild($div); // CHANGES DOM $elements = DOMXpath_reuser($f)->query("//div[@id]"); // INSTABLE! !! 

events may occur and requests will not work as expected !!

  • When (which DOMDocument methods affect XPath?)
  • Why can't we use something like normalizeDocument to "update the DOM" (exist?)?
  • Only "new DOMXpath ($ doc)"; is it always safe? need to reload $ doc?
+3
source share
3 answers

The DOMXpath class (instead of XSLTProcessor for your other question ) uses a reference to the specified DOMDocument object in contructor. DOMXpath create a libxml context object based on the given DOMDocument and save it to the internal data of the class. In addition, the libxml context is s saves references to original DOMDocument` specified in the constructor arguments.

What does this mean:

Part of a sample from ThomasWeinert answer:

 var_dump($xpath->document === $dom); // bool(true) $dom->loadXml($xml); var_dump($xpath->document === $dom); // bool(false) 

gives false after loading, since $dom already contains a pointer to the new libxml data, but the DOMXpath contains the libxml context for $dom before loading and a pointer to the actual document after loading.

Now about query works

If it should return XPATH_NODESET (as in your case), make node copy - node to node, an iterative throw detected by the node installed ( \ext\dom\xpath.c from line 468). Copy but with source document node as parent . This means that you can change the result, but it left XPath and DOMDocument out of you.

XPath results provide a parent Nomemeber that knows their origin:

  • for attribute values, parentNode returns the element that carries them. An example is // foo / @attribute, where the parent is foo Element.
  • for the text () function (as in // text ()), it returns an element containing the text or tail that was returned.
  • note that parentNode may not always return an element. For example, XPath's string () and concat () functions will build strings that have no origin. For them, parentNode will return None.

So,

  • There is no reason to cache XPath . This is nothing but xmlXPathNewContext (just highlight the lightweight internal structure ).
  • Every time you change your DOMDocument (removeChild, replaceChild, etc.), you must recreate XPath .
  • We cannot use something like normalizeDocument to "update the DOM" due to a change in the structure of the internal document and the invalidity of the xmlXPathNewContext created in the XPath constructor.
  • Only "new DOMXpath ($ doc);" is it always safe? Yes, if you do not change $ doc between XPath usage. You need to reload $ doc as well - no, because of this the previously created xmlXPathNewContext .
+2
source

The DOMXpath is affected by the load * () methods in the DOMDocument. After loading the new xml or html you need to recreate the DOMXpath instance:

 $xml = '<xml/>'; $dom = new DOMDocument(); $dom->loadXml($xml); $xpath = new DOMXpath($dom); var_dump($xpath->document === $dom); // bool(true) $dom->loadXml($xml); var_dump($xpath->document === $dom); // bool(false) 

In DOMXpath_reuser () you save a static variable and recreate xpath depending on the file name. If you want to reuse the Xpath object, suggest extending the DOMDocument. Thus, you only need to pass the $ dom variable. It will work with the saved XML file, as well as with the xml string or the document that you create.

The following class extends DOMDocument with the xpath () method, which always returns a valid DOMXpath instance for it. It also saves and registers namespaces:

 class MyDOMDocument extends DOMDocument { private $_xpath = NULL; private $_namespaces = array(); public function xpath() { // if the xpath instance is missing or not attached to the document if (is_null($this->_xpath) || $this->_xpath->document != $this) { // create a new one $this->_xpath = new DOMXpath($this); // and register the namespaces for it foreach ($this->_namespaces as $prefix => $namespace) { $this->_xpath->registerNamespace($prefix, $namespace); } } return $this->_xpath; } public function registerNamespaces(array $namespaces) { $this->_namespaces = array_merge($this->_namespaces, $namespaces); if (isset($this->_xpath)) { foreach ($namespaces as $prefix => $namespace) { $this->_xpath->registerNamespace($prefix, $namespace); } } } } $xml = <<<'ATOM' <feed xmlns="http://www.w3.org/2005/Atom"> <title>Test</title> </feed> ATOM; $dom = new MyDOMDocument(); $dom->registerNamespaces( array( 'atom' => 'http://www.w3.org/2005/Atom' ) ); $dom->loadXml($xml); // created, first access var_dump($dom->xpath()->evaluate('string(/atom:feed/atom:title)', NULL, FALSE)); $dom->loadXml($xml); // recreated, connection was lost var_dump($dom->xpath()->evaluate('string(/atom:feed/atom:title)', NULL, FALSE)); 
+3
source

(this is not a real answer, but a consolidation of comments and answers posted here and related questions)


This new version of the question function DOMXpath_reuser contains the @ThomasWeinert clause (to avoid DOM changes to the external rec load ) and the $enforceRefresh to solve the instability problem (since the related question shows that the programmer should determine when).

  function DOMXpath_reuser_v2($file, $enforceRefresh=0) { //changed here static $doc=NULL; static $docName=''; static $xp=NULL; if (!$doc) $doc = new DOMDocument(); if ( $file!=$docName || ($xp && $doc !== $xp->document) ) { // changed here $doc->load($file); $xp = NULL; } elseif ($enforceRefresh==2) { // add this new refresh mode $doc->loadXML($doc->saveXML()); $xp = NULL; } if (!$xp || $enforceRefresh==1) //changed here $xp = new DOMXpath($doc); return $xp; } 

When do I need to use $ enforceRefresh = 1?

... perhaps an open problem, just small tips and tricks ...

  • when the DOM is sent to setAttribute, removeChild, replaceChild, etc.
  • ...? more cases?

When do i need to use $ enforceRefresh = 2?

... perhaps an open problem, just small tips and tricks ...

+1
source

All Articles