What is the purpose of the DOMDocument-> documentURI property?

here's a link to the documentation: http://php.net/manual/en/class.domdocument.php#domdocument.props.documenturi

But I don’t understand if this setting is a value that this object detects, or is it a parameter that the user can change?

Does this value have any effect on html parsing with the loadHTML method? Can this be used to absolutize all relative references in the document being analyzed?

+4
source share
2 answers

Well, I hope I explain it correctly.

The following is the W3C DOM interface specification for documentUri :

documentUri type DOMString introduced in DOM Level 3

The location of the document, or null if undefined, or if the document was created using DOMImplementation.createDocument . When this attribute is set, lexical verification is not performed; this can result in a null value when using Node.baseURI .

Remember that when Document supports the HTML function [DOM Level 2 HTML], the href attribute of the HTML BASE element takes precedence over this attribute when computing Node.baseURI .

What does this mean for you?

But I don’t understand if this parameter is the value that this object detects, or can the user change this parameter?

This is the URI of the document. If you load a remote URI, for example, for example, this page, it will contain a remote URI, for example. The URL currently displayed in the browser address bar. The value is publicly available, so it is writable.

Does this value have any effect on html parsing using the loadHTML method?

In theory, yes. In practice, this depends on whether your DOMImplementation function has an HTML 2.0 function .

Can this be used to absolutize all relative references in the document being analyzed?

Not automatically. But you can really use it to add it manually to any links starting with the path. Of course, you need to implement the logic to check if you need to expand the href value.

+3
source

DOMDocuemnt::$documentURI property is well explained in the PHP manual:

The location of the document, or NULL if undefined.

This is a public property that is set if you are loading a document from a location. This is usually the name of the file (for example, "file:///C:/Tests/dom/data/file1.xml" ) or the URI ( "data://text/html;encoding=base64,PHA+aGVsbG8gd29ybGQ8L3A+" ) used in DOMDocument::load() or DOMDocument::loadHTMLFile() respectively.

If you are loading an XML string ( DOMDocument::loadXML() ), then documentURI is the current working directory.

If you load an HTML string ( DOMDocument::loadHTML() ), then documentURI is NULL , and it does not matter if this HTML element has <base href=""> .

Examples:

 <?php /** * what is the purpose of DOMDocument->documentURI property? * @link https://stackoverflow.com/q/4003543/367456 */ $doc = new DOMDocument(); $doc->load(__DIR__ . '/data/file1.xml'); var_dump($doc->documentURI); # "file:///C:/Tests/dom/data/file1.xml" $doc->loadHTMLFile(__DIR__ . '/data/file1.html'); var_dump($doc->documentURI); # "file:///C:/Tests/dom/data/file1.html" $doc->loadXML('<p>hello world</p>'); var_dump($doc->documentURI); # "file:///C:/Tests/dom/" (current working directory) $doc->loadHTML('<p>hello world</p>'); var_dump($doc->documentURI); # NULL $doc->loadHTML('<base href="http://example.com/base/"><i>test</i>'); var_dump($doc->documentURI); # NULL $doc->loadHTMLFile('data://text/html;encoding=base64,' . base64_encode('<p>hello world</p>')); var_dump($doc->documentURI); # "data://text/html;encoding=base64,PHA+aGVsbG8gd29ybGQ8L3A+" 

Caution: This property could be modeled after the DOM Core Level 3.0 specification (in combination with DOMNode::$baseUri ), however the DOM Core Level (the so-called functional version) is not supported by PHP DOMDocument .

This property can be used to set / enable the base URI of an HTML document. If it is NULL or an empty string, you need to provide it yourself. An example of resolving links in a document / see In the article adding the root path using php domdocument for more information.

+1
source

All Articles