Extract some XML tags from a string with PHP

I have the following function:

function translate($params) {
    $xmldata = '<?xml version="1.0" encoding="UTF-8" ?><root>' . html_entity_decode($params['data']) . '</root>';
    $lang = ucfirst(strtolower($params['lang']));
    if (simplexml_load_string($xmldata) === FALSE) {
        return $params['data'];
    } else {
        $langxmlobj = new SimpleXMLElement($xmldata);

        if ($langxmlobj -> $lang) {
            return ($langxmlobj -> $lang);
        } else {
            return $params['data'];
        }
    }
}

Which works great with strings like:

$params['data'] = '<English>Hello</English><French>Bonjour</French>';
$params['lang'] = 'English';
print translate($params);

output:

Hello

But...

When a string contains any other tags:

$params['data'] = '<English><h1>Hello</h1></English><French><h1>Bonjour</h1></French>';
$params['lang'] = 'English';

It does not output anything;

I wanted him to output:

<h1>Hello</h1> or any other tag within the <LanguageQuotes>

Pull my hair out; any idea?

VERSION 2:

This does not work when the line is similar:

$data = '<French><li><span class="pull-right">25 GB</span>Espace disque</French><English><li><span class="pull-right">25 GB</span>Disk Space</English>
<French><li><span class="pull-right">YES</span>PHP 5, MySQL 5</French><English><li><span class="pull-right">YES</span>PHP 5, MySQL 5</English>
<French><li><span class="pull-right">100</span>Bases de données</French><English><li><span class="pull-right">100</span>Databases</English>
<French><li><span class="pull-right"></span>E-Mails</French><English><li><span class="pull-right"></span>E-mails</English>';
+4
source share
5 answers

You have a problem with two parts.

  • Upload tag tag to XML document
  • Retrieving Data from XML

Loading data in XML

, XML, HTML . , DOMDocument ( ) HTML. UTF-8 , , .

$data = '<French><li><span class="pull-right">25 GB</span>Espace disque</French><English><li><span class="pull-right">25 GB</span>Disk Space</English>
<French><li><span class="pull-right">YES</span>PHP 5, MySQL 5</French><English><li><span class="pull-right">YES</span>PHP 5, MySQL 5</English>
<French><li><span class="pull-right">100</span>Bases de données</French><English><li><span class="pull-right">100</span>Databases</English>
<French><li><span class="pull-right"></span>E-Mails</French><English><li><span class="pull-right"></span>E-mails</English>';    

$html_data = 
  '<head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head>
   <body>'.$data.'</body>';

libxml_use_internal_errors(TRUE);
$dom = new DOMDocument();
$dom->loadHtml($html_data);
$dom->formatOutput = TRUE;

echo $dom->saveXml();

:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
  <body>
    <french>
      <li><span class="pull-right">25 GB</span>Espace disque</li>
    </french>
    <english>
      <li><span class="pull-right">25 GB</span>Disk Space</li>
    </english>
    <french>
      <li><span class="pull-right">YES</span>PHP 5, MySQL 5</li>
    </french>
    <english>
      <li><span class="pull-right">YES</span>PHP 5, MySQL 5</li>
    </english>
    ...
  </body>
</html>

, , . html body, , .

XML

, DOM, XPath .

- body SimpleXML:

$xpath = new DOMXpath($dom);
$root = simplexml_import_dom($xpath->evaluate('/html/body')->item(0));
var_dump($root);

:

object(SimpleXMLElement)#4 (2) {
  ["french"]=>
  array(4) {
    [0]=>
    object(SimpleXMLElement)#3 (1) {
      ["li"]=>
      object(SimpleXMLElement)#12 (1) {
        ["span"]=>
        string(5) "25 GB"
      }
    }
    ...
  }
  ["english"]=>
  array(4) {
    [0]=>
    object(SimpleXMLElement)#5 (1) {
      ["li"]=>
      object(SimpleXMLElement)#12 (1) {
        ["span"]=>
        string(5) "25 GB"
      }
    }
    ...

HTML:

$xpath = new DOMXpath($dom);
$string = '';
foreach ($xpath->evaluate('/html/body/*[name() = "english"]/*') as $node) {
  $string .= $dom->saveHtml($node);
}
echo $string;

:

<li>
<span class="pull-right">25 GB</span>Disk Space</li><li>
<span class="pull-right">YES</span>PHP 5, MySQL 5</li><li>
<span class="pull-right">100</span>Databases</li><li>
<span class="pull-right"></span>E-mails</li>
+3

. XML, , . .

/**
 * $matches[0] -> Returns string with the custom tag
 * $matches[1] -> Returns string without the custom tag
 * 
 * @param string $data
 * @param string $tag
 * @return string
 */
function find_between_custom_tag($data, $tag) {
    $regex = '/<' . $tag . '>(.*?)<\/' . $tag . '>/';
    preg_match($regex, $data, $matches);
    return $matches[1];
}

$data = '<English><h1>Hello</h1></English><French><h1>Bonjour</h1></French>';
$tag = 'English';

echo '<pre>';
echo htmlspecialchars( find_between_custom_tag($data, $tag) );
echo '</pre>';

:

<h1>Hello</h1>
0

, , regex.

function extractXML($data,$ce) {
  $all = array(
    "en" => "english",
    "fr" => "french",
  );
  $lang = $all[$ce];
  if (!$lang) { $lang='english'; }
  $re = "/\<".$lang."?\>(.*?)\<\/".$lang."\>/i";
  preg_match_all($re,$data,$matches);
  foreach ($matches[1] as $name) {
    $return .= $name;
  }
  return $return;
}

//Load your XML data
$test = '
  <english>This is in english</english>
  <english><div><span>This is also in english</span></div></english>
  <french><div><span>This is some text</span></div></french>
  <french><span>Regex Power!</span></french>
';
$str = '<?xml version="1.0" encoding="UTF-8" ?><root></root>';
echo $str.extractXMLLang($test,'en');

. extractXMLLang(String,Language-Abbreviation)

0

: 2 XML , - HTML XML-.

HTML XML, HTML- HTML. htmlspecialchars(). htmlentities() . .

HTML html_entity_decode().

:

$htmlSpecialFrench = htmlspecialchars('<li><span class="pull-right">25 GB</span>Espace disque');

$htmlSpecialFrench :

&lt;li&gt;&lt;span class=&quot;pull-right&quot;&gt;25 GB&lt;/span&gt;Espace disque

, $htmlSpecialEnglish .

HTML XML- XML-:

$data = "<French>$htmlSpecialFrench</French><English>$htmlSpecialEnglish</English>"

HTML- $data, . html_entity_decode() HTML.

0

, , , , :

script script ( Dreamweaver . :

<?php
$params= '<English>&lt;h1&gt;Hello&lt;/h1&gt;</English><French>&lt;h1&gt;Bonjour&lt;/h1&gt;</French>';
print $params;
?>

&lt;h1&gt; for <h1>
&lt;/h1&gt; for</h1>
0

All Articles