DOMDocument-> saveHTML () vs urlencode with commercial symbol (@)

Using DOMDocument(), I replace the links in $messageand add some things, for example [@MERGEID]. When I save changes using $dom_document->saveHTML(), the links get a kind of url-encoded. [@MERGEID]becomes %5B@MERGEID%5D.

Later in my code I need to replace [@MERGEID]with ID. So I'm looking for urlencode('[@MERGEID]')- however, it urlencode()changes the commercial character (@) to% 40, and saveHTML () left it alone. Therefore there is no coincidence -'%5B@MERGEID%5D' != '%5B%40MERGEID%5D'

Now I know what I can run str_replace('%40', '@', urlencode('[@MERGEID]'))to get what I need to find the merge variable in $ message.

My question is that the RFC specification uses a DOMDocument and why is it different from urlencode or even rawurlencode? Is there anything I can do to save str_replace?

Demo code:

$message = '<a href="http://www.google.com?ref=abc" data-tag="thebottomlink">Google</a>';
$dom_document = new \DOMDocument();
libxml_use_internal_errors(true); //Supress content errors
$dom_document->loadHTML(mb_convert_encoding($message, 'HTML-ENTITIES', 'UTF-8'));       
$elements = $dom_document->getElementsByTagName('a');
foreach($elements as $element) {    
    $link = $element->getAttribute('href'); //http://www.google.com?ref=abc
    $tag = $element->getAttribute('data-tag'); //thebottomlink
    if ($link) {
        $newlink = 'http://www.example.com/click/[@MERGEID]?url=' . $link;
        if ($tag) {
            $newlink .= '&tag=' . $tag;
        } 
        $element->setAttribute('href', $newlink);
    }
}
$message = $dom_document->saveHTML();
$urlencodedmerge = urlencode('[@MERGEID]');
die($message . ' and url encoded version: ' . $urlencodedmerge); 
//<a data-tag="thebottomlink" href="http://www.example.com/click/%5B@MERGEID%5D?url=http://www.google.com?ref=abc&amp;tag=thebottomlink">Google</a> and url encoded version: %5B%40MERGEID%5D
+4
source share
5 answers

I believe that these two encodings serve different purposes. urlencode()encodes the "string to be used in the request part of the URL" , and $element->setAttribute('href', $newlink);encodes the full URL to be used as the URL.

For instance:

urlencode('http://www.google.com'); // -> http%3A%2F%2Fwww.google.com

This is convenient for encoding part of the request, but cannot be used in <a href='...'>.

However:

$element->setAttribute('href', $newlink); // -> http://www.google.com

, - href. , @, , @ URL- userinfo email (: mailto:invisal@google.com invisal@127.0.0.1)


  • [@MERGEID] @@MERGEID@@. . , urlencode.

  • urlencode, % 40 @. , , $newlink = 'http://www.example.com/click/[%40MERGEID]?url=' . $link;

  • - $newlink = 'http://www.example.com/click/' . urlencode('[@MERGEID]') . '?url=' . $link;

+5

urlencode rawurlencode RFC 1738. 2005 RFC, URI, RFC 3986.

, DOM UTF-8, RFC 3629. utf8_encode() utf8_decode() ISO-8859-1 Iconv .

URI , URI, URI , , UTF-8 .

URL- RFC 3986.

<?php
    function myUrlEncode($string) {
       $entities = array('%21', '%2A', '%27', '%28', '%29', '%3B', '%3A', '%40', '%26', '%3D', '%2B', '%24', '%2C', '%2F', '%3F', '%25', '%23', '%5B', '%5D');
       $replacements = array('!', '*', "'", "(", ")", ";", ":", "@", "&", "=", "+", "$", ",", "/", "?", "%", "#", "[", "]");
       return str_replace($entities, $replacements, urldecode($string));
    }
?>

PHP Fiddle.


Update:

UTF8 $message:

$dom_document->loadHTML(mb_convert_encoding($message, 'HTML-ENTITIES', 'UTF-8'))

urldecode($message) URL .

die(urldecode($message) . ' and url encoded version: ' . $urlencodedmerge); 
+3

.

-, , , , .

$message DomDocument, . , "" HTML.

, DomDocument, , :

$token = 'blah blah [@MERGEID]';
$message = '<a id="' . $token . '" href="' . $token . '"></a>';

$dom = new DOMDocument();
$dom->loadHTML($message);
echo $dom->saveHTML(); // now we have an abstract HTML document

// extract a raw value
$rawstring = $dom->getElementsByTagName('a')->item(0)->getAttribute('href');
// do the low-level fiddling
$newstring = str_replace($token, 'replaced', $rawstring);
// push the new value back into the abstract black box.
$dom->getElementsByTagName('a')->item(0)->setAttribute('href', $newstring);

// less code written, but works all the time
$rawstring = $dom->getElementsByTagName('a')->item(0)->getAttribute('id');
$newstring = str_replace($token, 'replaced', $rawstring);
$dom->getElementsByTagName('a')->item(0)->setAttribute('id', $newstring);

echo $dom->saveHTML();

, , href, . , , HTML.

( DomDocument , , , )


:

function searchAndReplace(DOMNode $node, $search, $replace) {
    if($node->hasAttributes()) {
        foreach ($node->attributes as $attribute) {
            $input = $attribute->nodeValue;
            $output = str_replace($search, $replace, $input);
            $attribute->nodeValue = $output;
        }
    }

    if(!$node instanceof DOMElement) { // this test needs double-checking
        $input = $node->nodeValue;
        $output = str_replace($search, $replace, $input);
        $node->nodeValue = $output;
    }

    if($node->hasChildNodes()) {
        foreach ($node->childNodes as $child) {
            searchAndReplace($child, $search, $replace);
        }
    }
}

$token = '<>&;[@MERGEID]';
$message = '<a/>';

$dom = new DOMDocument();
$dom->loadHTML($message);

$dom->getElementsByTagName('a')->item(0)->setAttribute('id', "foo$token");
$dom->getElementsByTagName('a')->item(0)->setAttribute('href', "http://foo@$token");
$textNode = new DOMText("foo$token");
$dom->getElementsByTagName('a')->item(0)->appendchild($textNode);

echo $dom->saveHTML();

searchAndReplace($dom, $token, '*replaced*');

echo $dom->saveHTML();
+2

saveXML(), saveHTML():

PHP

//your code...
$message = $dom_document->saveXML();

EDIT: XML:

//this will add an xml tag, so just remove it
$message=preg_replace("/\<\?xml(.*?)\?\>/","",$message);

echo $message;

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><a href="http://www.example.com/click/[@MERGEID]?url=http://www.google.com?ref=abc&amp;tag=thebottomlink" data-tag="thebottomlink">Google</a></body></html>

, - & &amp;

0

[@mergeid], , ? str_replace?

$newlink = 'http://www.example.com/click/'.urlencode('[@MERGEID]').'?url=' . $link;

, , , .

0

All Articles