PHP SimpleXML does not save line breaks in XML attributes

I need to parse XML with an external representation that has line break attributes. Using SimpleXML, line breaks seem lost. According to another jump-stack question , line breaks should be valid (albeit much less ideal!) For XML.

Why are they lost? [edit] And how can I save them? [/ edit]

Here is a demo script file (note that when line breaks are not in the attribute, they are saved).

PHP file with embedded XML

$xml = <<<XML <?xml version="1.0" encoding="utf-8"?> <Rows> <data Title='Data Title' Remarks='First line of the row. Followed by the second line. Even a third!' /> <data Title='Full Title' Remarks='None really'>First line of the row. Followed by the second line. Even a third!</data> </Rows> XML; $xml = new SimpleXMLElement( $xml ); print '<pre>'; print_r($xml); print '</pre>'; 

Exit print_r

 SimpleXMLElement Object ( [data] => Array ( [0] => SimpleXMLElement Object ( [@attributes] => Array ( [Title] => Data Title [Remarks] => First line of the row. Followed by the second line. Even a third! ) ) [1] => First line of the row. Followed by the second line. Even a third! ) ) 
+6
xml php simplexml
source share
6 answers

Object for a new line &#10; . I played with your code until I found something that did the trick. This is not very elegant, I warn you:

 //First remove any indentations: $xml = str_replace(" ","", $xml); $xml = str_replace("\t","", $xml); //Next replace unify all new-lines into unix LF: $xml = str_replace("\r","\n", $xml); $xml = str_replace("\n\n","\n", $xml); //Next replace all new lines with the unicode: $xml = str_replace("\n","&#10;", $xml); Finally, replace any new line entities between >< with a new line: $xml = str_replace(">&#10;<",">\n<", $xml); 

The assumption based on your example is that any new lines that occur inside a node or attribute will have more text in the next line, not < to open a new element.

This, of course, will fail if the next line contains text that has been wrapped in a line level element.

+4
source share

Using SimpleXML, line breaks seem lost.

Yes, this is expected ... in fact, it is necessary for any consistent XML parser, which newlines in the attribute values ​​are simple spaces. See the normalization of attribute values in the XML specification.

If the attribute value was to have a genuine newline character, XML should contain a reference to the &#10; instead of the original line.

+11
source share

Assuming $ xmlData is your XML string before being sent to the parser, this should replace all new lines in the attributes with the correct entity. I am having a problem with XML coming from SQL Server.

 $parts = explode("<", $xmlData); //split over < array_shift($parts); //remove the blank array element $newParts = array(); //create array for storing new parts foreach($parts as $p) { list($attr,$other) = explode(">", $p, 2); //get attribute data into $attr $attr = str_replace("\r\n", "&#10;", $attr); //do the replacement $newParts[] = $attr.">".$other; // put parts back together } $xmlData = "<".implode("<", $newParts); // put parts back together prefixing with < 

You can probably make it easier with regex, but that is not very important to me.

+1
source share

Here is the code to replace newlines with the appropriate character reference in this particular XML fragment. Run this code before parsing.

 $replaceFunction = function ($matches) { return str_replace("\n", "&#10;", $matches[0]); }; $xml = preg_replace_callback( "/<data Title='[^']+' Remarks='[^']+'/i", $replaceFunction, $xml); 
+1
source share

This is what worked for me:

First enter xml as a string:

  $xml = file_get_contents($urlXml); 

Then do the replacement:

  $xml = str_replace(".\xe2\x80\xa9<as:eol/>",".\n\n<as:eol/>",$xml); 

"." and "<as: eol /"> "were there, because in this case I needed to add breaks. You can replace the new lines" \ n "with whatever you like.

After replacing, simply load the xml string as a SimpleXMLElement object:

  $xmlo = new SimpleXMLElement( $xml ); 

Et Voilà

0
source share

Well, this question is old, but like me, someone may come to this page in the end. I had a slightly different approach, and I think that of them was more elegant.

Inside xml, you put a unique word that you will use for a new line.

Change xml to

 <data Title='Data Title' Remarks='First line of the row. \n Followed by the second line. \n Even a third!' /> 

And then, when you get the path to the desired node in SimpleXML in the line output, write something like this:

 $findme = '\n'; $pos = strpos($output, $findme); if($pos!=0) { $output = str_replace("\n","<br/>",$output); 

It does not have to be \ n, it can be any unique char.

0
source share

All Articles