Problem with adding root path using php domdocument

I would like to add the site root path for those anchor tags that do not have a root path using the php dom document, so far they have made a function to do this using the str_replace function, but for some links three and in time root paths have been added. Then what should I edit in this function.

Problem : = The problem is that it adds three for the root time path for each anchor tag, and not for some. The $ HTML variable has many anchor tags, about 200 links. And also for images.

I know this is a very dirty question, but what I missed, I can’t get.

function addRootPathToAnchor($HTML) { $tmpHtml = ''; $xml = new DOMDocument(); $xml->validateOnParse = true; $xml->loadHTML($HTML); foreach ($xml->getElementsByTagName('a') as $a ) { $href = $a->getAttribute('href'); if(strpos($href,'www' > 0)) continue; else $HTML = str_replace($href,"http://www.mysite.com/".$href,$HTML); } return $HTML; } 
0
source share
2 answers

I see some problems in your code:

  • Decide whether the URI has a full root path (is a full URI) or not.
  • You do not allow relative URLs to the base URL. Just adding does not do the job.
  • The function returns a DomDocument object, not a string. I assume that you do not want this, but I do not know, you did not indicate in your question.

How to determine if a URL is relative.

Relative URLs do not define the protocol. So I have to check this to determine if the href attribute is a complete (absolute) URI or not ( Demo ):

 $isRelative = (bool) !parse_url($url, PHP_URL_SCHEME); 

Resolving a relative URL to a base URL

However, this does not help you correctly resolve the relative URL of the base URL. What you do is conceptually disrupted. He pointed out in the RFC how to resolve relative URIs to the base URL ( RFC 1808 and RFC 3986 ). You can use the existing library to just do the work for you, the working one is Net_URL2 :

 require_once('Net/URL2.php'); # or configure your autoloader $baseUrl = 'http://www.example.com/test/images.html'; $hrefRelativeOrAbsolute = '...'; $baseUrl = new Net_URL2($baseUrl); $urlAbsolute = (string) $baseUrl->resolve($hrefRelativeOrAbsolute); 
+2
source

Instead of if(strpos($href,'www' > 0)) you should use if(strpos($href,'www') !== false) .

> 0 was inside the call function ( strpos() ).

+1
source

All Articles