Decrypt multiple xml tags internally with PHP

Question

Decrypt multiple xml tags internally with PHP

I'm looking for a “smart way” to decode multiple XML tags within a string, I have the following function:

function b($params) { $xmldata = '<?xml version="1.0" encoding="UTF-8" ?><root>' . html_entity_decode($params['data']) . '</root>'; $lang = ucfirst(strtolower($params['lang'])); if (simplexml_load_string($xmldata) === FALSE) { return $params['data']; } else { $langxmlobj = new SimpleXMLElement($xmldata); if ($langxmlobj -> $lang) { return $langxmlobj -> $lang; } else { return $params['data']; } } }

And try

 $params['data'] = '<French>Service DNS</French><English>DNS Service</English> - <French>DNS Gratuit</French><English>Free DNS</English>'; $params['lang'] = 'French'; $a = b($params); print_r($a);

But the outputs are:

 Service DNS

And I want it to basically output each tag, so the result should be:

 Service DNS - DNS Gratuit

Pull out the hair. Any quick help or directions would be appreciated.

Edit: Refine needs.

It seems I was not clear enough; so let me show you another example

If I have the following line as input:

 The <French>Chat</French><English>Cat</English> is very happy to stay on stackoverflow because it makes him <French>Heureux</French><English>Happy</English> to know that it is the best <French>Endroit</French><English>Place</English> to find good people with good <French>Réponses</French><English>Answers</English>.

So, if I run the function with "French", it will return:

 The Chat is very happy to stay on stackoverflow because it makes him Heureux to know that it is the best Endroit to find good people with good Réponses.

And with 'English':

 The Cat is very happy to stay on stackoverflow because it makes him Happy to know that it is the best Place to find good people with good Answers.

Hopefully this will become clearer now.

+7

php xml-parsing

Disco Dec 12 '13 at 12:01

source share

5 answers

If you understood correctly, you would like to remove all the "language" tags, but save the contents of the provided language.

DOM is a tree of nodes. Tags are element nodes; text is stored in text nodes. Xpath allows you to select nodes using expressions. Therefore, take all the child nodes of the language elements that you want to save, and copy them directly in front of the node language. Then remove all language nodes. This will work even if the language elements contain other element nodes, such as  .

 function replaceLanguageTags($fragment, $language) { $dom = new DOMDocument(); $dom->loadXml( '<?xml version="1.0" encoding="UTF-8" ?><content>'.$fragment.'</content>' ); // get an xpath object $xpath = new DOMXpath($dom); // fetch all nodes with the language you like to keep $nodes = $xpath->evaluate('//'.$language); foreach ($nodes as $node) { // copy all the child nodes of just before the found node foreach ($node->childNodes as $childNode) { $node->parentNode->insertBefore($childNode->cloneNode(TRUE), $node); } // remove the found node $node->parentNode->removeChild($node); } // select all language nodes $tags = array('English', 'French'); $nodes = $xpath->evaluate('//'.implode('|//', $tags)); foreach ($nodes as $node) { // remove them $node->parentNode->removeChild($node); } $result = ''; // we do not need the root node, so save all its children foreach ($dom->documentElement->childNodes as $node) { $result .= $dom->saveXml($node); } return $result; } $xml = <<<'XML' The <French>Chat</French><English>Cat</English> is very happy to stay on stackoverflow because it makes him <French>Heureux</French><English>Happy</English> to know that it is the best <French>Endroit</French><English>Place</English> to find good people with good <French>Réponses</French><English>Answers</English>. XML; var_dump(replaceLanguageTags($xml, 'English')); var_dump(replaceLanguageTags($xml, 'French'));

Output:

 string(146) "The Cat is very happy to stay on stackoverflow because it makes him Happy to know that it is the best Place to find good people with good Answers." string(153) "The Chat is very happy to stay on stackoverflow because it makes him Heureux to know that it is the best Endroit to find good people with good Réponses."

+3

Thw Dec 17 '13 at 10:59

source share

What version of PHP are you using? I don’t know what else could be different, but I copied and pasted your code and received the following output:

 SimpleXMLElement Object ( [0] => Service DNS [1] => DNS Gratuit )

To be sure, this is the code I copied from above:

 <?php function b($params) { $xmldata = '<?xml version="1.0" encoding="UTF-8" ?><root>' . html_entity_decode($params['data']) . '</root>'; $lang = ucfirst(strtolower($params['lang'])); if (simplexml_load_string($xmldata) === FALSE) { return $params['data']; } else { $langxmlobj = new SimpleXMLElement($xmldata); if ($langxmlobj -> $lang) { return $langxmlobj -> $lang; } else { return $params['data']; } } } $params['data'] = '<French>Service DNS</French><English>DNS Service</English> - <French>DNS Gratuit</French><English>Free DNS</English>'; $params['lang'] = 'French'; $a = b($params); print_r($a);

+2

Joe t Dec 14 '13 at 13:50

source share

Here is my suggestion. It should be quick and easy. You just need to break the tags of the desired language, and then remove any other tags along with their contents.

The downside is that if you want to use any other tags besides the language tag, you have to make sure that the opener is different from the close (for example, Lorem instead of Lorem ). On the other hand, it allows you to add as many languages as you want without saving their list. You only need to know the default value (or just throw and catch exceptions) when the requested language is missing.

 function only_lang($lang, $text) { static $infinite_loop; $result = str_replace("<$lang>", '', $text, $num_matches_open); $result = str_replace("</$lang>", '', $result, $num_matches_close); // Check if the text is malformed. Good place to throw an error if($num_matches_open != $num_matches_close) { //throw new Exception('Opening and closing tags does not match', 1); return $text; } // Check if this language is present at all. // Otherwise fallback to default language or throw an error if( ! $num_matches_open) { //throw new Exception('No such language', 2); // Prevent infinite loop if even the default language is missing if($infinite_loop) return $text; $infinite_loop = __FUNCTION__; return $infinite_loop('English', $text); } // Strip any other language and return the result return preg_replace('!<([^>]+)>.*</\\1>!', '', $result); }

+2

core1024 Dec 18 '13 at 9:00

source share

I got simple using regex. Useful if the input contains only <lang>...</lang> tags.

 function to_lang($lang="", $str="") { return strip_tags(preg_replace('~<(\w+(?<!'.$lang.'))>.*</\1>~Us',"",$str)); } echo to_lang("English","The happy <French>Chat</French><English>Cat</English>");

Removes every <tag>...</tag> that is not specified in $lang . If in <tag-name> may be spaces / special characters, e.g. <French-1> replace \w with [^/>] .

Search chart explained a little

1.) <(\w+(?<!'.$lang.'))

< followed by one or more Word characters, does not match $lang (using a negative lookbehind ) and captures <tag_name>

2.) .* Followed by something (ungreedy: modifier U , dot matches newlines: s modifier)

3.) </\1> until the captured tag is closed

+1

Jonny 5 Dec 21 '13 at 0:45

source share

Andrew · Accepted Answer · 2013-12-14T13:59:58+0000

Basically, I will first analyze the lang section, for example:

 <French>Chat</French><English>Cat</English>

with this:

 "@(<($defLangs)>.*?</\\2>) +@i "

Then parse the right line with the callback.

If you got php 5.3+, then:

 function transLang($str, $lang, $defLangs = 'French|English') { return preg_replace_callback ( "@(<($defLangs)>.*?</\\2>) +@i ", function ($matches) use($lang) { preg_match ( "/<$lang>(.*?)<\/$lang>/i", $matches [0], $longSec ); return $longSec [1]; }, $str ); } echo transLang ( $str, 'French' ), "\n", transLang ( $str, 'English' );

If not, a little harder:

 class LangHelper { private $lang; function __construct($lang) { $this->lang = $lang; } public function callback($matches) { $lang = $this->lang; preg_match ( "/<$lang>(.*?)<\/$lang>/i", $matches [0], $subMatches ); return $subMatches [1]; } } function transLang($str, $lang, $defLangs = 'French|English') { $langHelper = new LangHelper ( $lang ); return preg_replace_callback ( "@(<($defLangs)>.*?</\\2>) +@i ", array ( $langHelper, 'callback' ), $str ); } echo transLang ( $str, 'French' ), "\n", transLang ( $str, 'English' );

Decrypt multiple xml tags internally with PHP

More articles: