Decrypt multiple xml tags internally with PHP

I'm looking for a “smart way” to decode multiple XML tags within a string, I have the following function:

function b($params) { $xmldata = '<?xml version="1.0" encoding="UTF-8" ?><root>' . html_entity_decode($params['data']) . '</root>'; $lang = ucfirst(strtolower($params['lang'])); if (simplexml_load_string($xmldata) === FALSE) { return $params['data']; } else { $langxmlobj = new SimpleXMLElement($xmldata); if ($langxmlobj -> $lang) { return $langxmlobj -> $lang; } else { return $params['data']; } } } 

And try

 $params['data'] = '<French>Service DNS</French><English>DNS Service</English> - <French>DNS Gratuit</French><English>Free DNS</English>'; $params['lang'] = 'French'; $a = b($params); print_r($a); 

But the outputs are:

 Service DNS 

And I want it to basically output each tag, so the result should be:

 Service DNS - DNS Gratuit 

Pull out the hair. Any quick help or directions would be appreciated.


Edit: Refine needs.

It seems I was not clear enough; so let me show you another example

If I have the following line as input:

 The <French>Chat</French><English>Cat</English> is very happy to stay on stackoverflow because it makes him <French>Heureux</French><English>Happy</English> to know that it is the best <French>Endroit</French><English>Place</English> to find good people with good <French>Réponses</French><English>Answers</English>. 

So, if I run the function with "French", it will return:

 The Chat is very happy to stay on stackoverflow because it makes him Heureux to know that it is the best Endroit to find good people with good Réponses. 

And with 'English':

 The Cat is very happy to stay on stackoverflow because it makes him Happy to know that it is the best Place to find good people with good Answers. 

Hopefully this will become clearer now.

+7
php xml-parsing
source share
5 answers

Basically, I will first analyze the lang section, for example:

 <French>Chat</French><English>Cat</English> 

with this:

 "@(<($defLangs)>.*?</\\2>) +@i " 

Then parse the right line with the callback.

If you got php 5.3+, then:

 function transLang($str, $lang, $defLangs = 'French|English') { return preg_replace_callback ( "@(<($defLangs)>.*?</\\2>) +@i ", function ($matches) use($lang) { preg_match ( "/<$lang>(.*?)<\/$lang>/i", $matches [0], $longSec ); return $longSec [1]; }, $str ); } echo transLang ( $str, 'French' ), "\n", transLang ( $str, 'English' ); 

If not, a little harder:

 class LangHelper { private $lang; function __construct($lang) { $this->lang = $lang; } public function callback($matches) { $lang = $this->lang; preg_match ( "/<$lang>(.*?)<\/$lang>/i", $matches [0], $subMatches ); return $subMatches [1]; } } function transLang($str, $lang, $defLangs = 'French|English') { $langHelper = new LangHelper ( $lang ); return preg_replace_callback ( "@(<($defLangs)>.*?</\\2>) +@i ", array ( $langHelper, 'callback' ), $str ); } echo transLang ( $str, 'French' ), "\n", transLang ( $str, 'English' ); 
+6
source share

If you understood correctly, you would like to remove all the "language" tags, but save the contents of the provided language.

DOM is a tree of nodes. Tags are element nodes; text is stored in text nodes. Xpath allows you to select nodes using expressions. Therefore, take all the child nodes of the language elements that you want to save, and copy them directly in front of the node language. Then remove all language nodes. This will work even if the language elements contain other element nodes, such as <em> .

 function replaceLanguageTags($fragment, $language) { $dom = new DOMDocument(); $dom->loadXml( '<?xml version="1.0" encoding="UTF-8" ?><content>'.$fragment.'</content>' ); // get an xpath object $xpath = new DOMXpath($dom); // fetch all nodes with the language you like to keep $nodes = $xpath->evaluate('//'.$language); foreach ($nodes as $node) { // copy all the child nodes of just before the found node foreach ($node->childNodes as $childNode) { $node->parentNode->insertBefore($childNode->cloneNode(TRUE), $node); } // remove the found node $node->parentNode->removeChild($node); } // select all language nodes $tags = array('English', 'French'); $nodes = $xpath->evaluate('//'.implode('|//', $tags)); foreach ($nodes as $node) { // remove them $node->parentNode->removeChild($node); } $result = ''; // we do not need the root node, so save all its children foreach ($dom->documentElement->childNodes as $node) { $result .= $dom->saveXml($node); } return $result; } $xml = <<<'XML' The <French>Chat</French><English>Cat</English> is very happy to stay on stackoverflow because it makes him <French>Heureux</French><English>Happy</English> to know that it is the best <French>Endroit</French><English>Place</English> to find good people with good <French>Réponses</French><English>Answers</English>. XML; var_dump(replaceLanguageTags($xml, 'English')); var_dump(replaceLanguageTags($xml, 'French')); 

Output:

 string(146) "The Cat is very happy to stay on stackoverflow because it makes him Happy to know that it is the best Place to find good people with good Answers." string(153) "The Chat is very happy to stay on stackoverflow because it makes him Heureux to know that it is the best Endroit to find good people with good Réponses." 
+3
source share

What version of PHP are you using? I don’t know what else could be different, but I copied and pasted your code and received the following output:

 SimpleXMLElement Object ( [0] => Service DNS [1] => DNS Gratuit ) 

To be sure, this is the code I copied from above:

 <?php function b($params) { $xmldata = '<?xml version="1.0" encoding="UTF-8" ?><root>' . html_entity_decode($params['data']) . '</root>'; $lang = ucfirst(strtolower($params['lang'])); if (simplexml_load_string($xmldata) === FALSE) { return $params['data']; } else { $langxmlobj = new SimpleXMLElement($xmldata); if ($langxmlobj -> $lang) { return $langxmlobj -> $lang; } else { return $params['data']; } } } $params['data'] = '<French>Service DNS</French><English>DNS Service</English> - <French>DNS Gratuit</French><English>Free DNS</English>'; $params['lang'] = 'French'; $a = b($params); print_r($a); 
+2
source share

Here is my suggestion. It should be quick and easy. You just need to break the tags of the desired language, and then remove any other tags along with their contents.

The downside is that if you want to use any other tags besides the language tag, you have to make sure that the opener is different from the close (for example, <p >Lorem</p> instead of <p>Lorem</p> ). On the other hand, it allows you to add as many languages ​​as you want without saving their list. You only need to know the default value (or just throw and catch exceptions) when the requested language is missing.

 function only_lang($lang, $text) { static $infinite_loop; $result = str_replace("<$lang>", '', $text, $num_matches_open); $result = str_replace("</$lang>", '', $result, $num_matches_close); // Check if the text is malformed. Good place to throw an error if($num_matches_open != $num_matches_close) { //throw new Exception('Opening and closing tags does not match', 1); return $text; } // Check if this language is present at all. // Otherwise fallback to default language or throw an error if( ! $num_matches_open) { //throw new Exception('No such language', 2); // Prevent infinite loop if even the default language is missing if($infinite_loop) return $text; $infinite_loop = __FUNCTION__; return $infinite_loop('English', $text); } // Strip any other language and return the result return preg_replace('!<([^>]+)>.*</\\1>!', '', $result); } 
+2
source share

I got simple using regex. Useful if the input contains only <lang>...</lang> tags.

 function to_lang($lang="", $str="") { return strip_tags(preg_replace('~<(\w+(?<!'.$lang.'))>.*</\1>~Us',"",$str)); } echo to_lang("English","The happy <French>Chat</French><English>Cat</English>"); 

Removes every <tag>...</tag> that is not specified in $lang . If in <tag-name> may be spaces / special characters, e.g. <French-1> replace \w with [^/>] .


Search chart explained a little

1.) <(\w+(?<!'.$lang.'))

< followed by one or more Word characters, does not match $lang (using a negative lookbehind ) and captures <tag_name>

2.) .* Followed by something (ungreedy: modifier U , dot matches newlines: s modifier)

3.) </\1> until the captured tag is closed

+1
source share

All Articles