How can you edit valid XML on a web page?

I need to run a quick and dirty configuration editor. The stream looks something like this:

Configuration

(POCOs on the server) are serialized in XML.
At this point, the XML is well formed. The configuration is sent to the web server in XElements.
On the web server, XML (Yes, ALL OF IT) is dumped into the text field for editing.
The user edits the XML directly on the web page and clicks the submit button.
In the response, I retrieve the modified XML configuration text. At this point, all escape sequences were returned by the process of displaying them on the web page.
I am trying to load a string into an XML object (XmlElement, XElement, whatever). KABOOM.

The problem is that serialization excludes attribute strings, but this gets lost in translation along the way.

For example, let's say I have an object with a regular expression. Here's the configuration of how this happens on the web server:

<Configuration> <Validator Expression="[^&lt;]" /> </Configuration> 

So, I put this in a text box where it looks like this:

 <Configuration> <Validator Expression="[^<]" /> </Configuration> 

Thus, the user makes a small modification and returns the changes. On the web server, the response line looks like this:

 <Configuration> <Validator Expression="[^<]" /> <Validator Expression="[^&]" /> </Configuration> 

So, the user has added another validator gizmo, and now BOTH has attributes with illegal characters. If I try to load this into any XML object, it throws an exception because <and are not valid in the text string. I CAN'T CAN'T CAN'T use any encoding function as it encodes the whole bloody thing:

var result = Server.HttpEncode (editConfig);

leads to

 &lt;Configuration&gt; &lt;Validator Expression="[^&lt;]" /&gt; &lt;Validator Expression="[^&amp;]" /&gt; &lt;/Configuration&gt; 

This is IMPOSSIBLE XML. If I try to load this into an XML element of any type, I will be struck by the falling anvil. I do not like falling anvils.

SO, the question remains ... Is the only way to get this XML string, ready for parsing into an XML object, by replacing regular expressions? Is there a way to "disable restrictions" on boot? How do you get around this?


One last answer, and then wiki-izing this, since I don't think there is a correct answer.

The XML I put in the text box. Valid, shielded XML. Process: 1) place it in the text area; 2) send it to the client; 3) display it to the client; 4) send the form to 5) send it back to the server and 6) extract the value from the form DELETE ANY AND ALL ITEMS.

Let me say it again: I do not give up ANYTHING. Just displaying it in the browser does it!

Things to think about: is there a way to prevent this non-escaping in the first place? Is there a way to accept practically valid XML and to “clean” it in a safe way?


This question now has generosity. To collect the award, you demonstrate how to edit VALID XML in a browser window WITHOUT a third-party / open source, which does not require me to use a regular expression in order to avoid attribute values ​​manually, which does not require users to avoid their attributes, and this is not interrupted when rounding (& amp; etc;)

+4
source share
8 answers

Erm ... How do you serialize? In general, an XML serializer should never generate invalid XML.

/ EDIT in response to your update: Do not display the wrong XML for your user to edit! Instead, render the properly shielded XML in a TextBox. Recovering broken XML is not fun, and I actually see no reason not to display / edit the XML in a valid, shielded form.

Again I could ask: how do you render XML in a TextBox? It seems you intentionally canceled the XML at some point.

/ EDIT in response to your last comment: yes, obviously, since it may contain HTML. Before writing it to an HTML page, you need to avoid using XML properly. In doing so, I mean all of the XML. So:

 <foo mean-attribute="&lt;"> 

becomes the following:

 &lt;foo mean-attribute="&amp;&lt;"&gt; 
+7
source

Of course, when you put links to entities inside the text area, they come out without support. Textures are not magical, you need & avoid; everything that you put in them, like all other elements. Browsers can display the raw value '<' in the text box, but only because they try to clear your errors.

So, if you put editable XML in a text box, you need to get away from the attribute value once to make it valid XML, and then you need to escape all XML again to make it valid HTML. The final source you want to display on the page will be:

 <textarea name="somexml"> &lt;Configuration&gt; &lt;Validator Expression="[^&amp;lt;]" /&gt; &lt;Validator Expression="[^&amp;amp;]" /&gt; &lt;/Configuration&gt; </textarea> 

The question is based on a misunderstanding of the content model of the textarea element - the validator immediately fixed the problem.

ETA re comment: Well, what problem remains? This is a serialization issue. It remains only to analyze it, and for this you must assume that the user can create a well-formed XML.

Trying to parse malformed XML to allow for errors such as the presence of '<' or '&' unescaped in the attribute value is a loss, completely against how XML should work. If you cannot trust your users to write well-formed XML, give them a lighter interface than XML, such as a simple list of regexp strings separated by a string character.

+5
source

As you say, a normal serializer should avoid everything for you.

So the problem is with the text block: you need to handle everything that went through the text block yourself.

You can try HttpUtility.HtmlEncode (), but I think the easiest way is to simply conclude everything that you go through the text block in the CDATA section.

Usually, of course, I would like everything to be properly shielded, rather than relying on a CDATA crutch, but I would also like to use the built-in tools to do the shielding. For something that is edited in it by the "sleeping" state of the user, I think that CDATA can be a way.

Also see this earlier question:
Best way to encode text data for XML


Update
Based on the comments on another answer, I realized that you are showing users the markup, not just the content. Xml parsers are, well, picky. I think the best thing you could do in this case is to check the correctness of the form before accepting the edited xml.

Maybe try to automatically fix certain types of errors (for example, bad ampersands from my related question), but then enter the line number and column number of the first validation error from the .NET xml parser and use this to show users where their error is until until they give you something acceptable. Bonus points if you are also checking the scheme.

+1
source

You can take a look at something like TinyMCE , which allows you to edit html in a text field with a lot of characters. If you cannot tune it to what you want, you can use it as inspiration.

+1
source

Note. firefox (in my test) does not stand out in text areas as you describe. In particular, this code:

 <textarea cols="80" rows="10" id="1"></textarea> <script> elem = document.getElementById("1"); elem.value = '\ <Configuration>\n\ <Validator Expression="[^&lt;]" />\n\ </Configuration>\ ' alert(elem.value); </script> 

A warning is issued and displayed to the user without changes:

 <Configuration> <Validator Expression="[^&lt;]" /> </Configuration> 

So, there may be one (non-viable?) Solution for your users to use firefox.


It seems that two parts of your question have been uncovered:

1 XML that you display is not displayed.

For example, " &lt; " does not appear as "<". But since "<" is also not displayed as "<", information is lost and you cannot return it.

One solution is to delete all the characters & , so that &lt; becomes &amp;lt; The text box will not display as " &lt; ". When you read this, it will be as it was in the first place. (I assume that textarea actually modifies the string, but firefox doesn't behave the way you report, so I can't verify this)

Another solution (mentioned already I think) is to create / buy / borrow a custom text area (not bad if simple, but there are all the editing keys, ctrl-C, ctrl-shift-left, etc.).

2 You would like users to not have to worry about shielding.

You're on the run-hell:

Replacing regular expressions will work mostly ... but how can you reliably determine the final quote (") when the user can (legally, within the conditions that you specify) enter:

 <Configuration> <Validator Expression="[^"<]" /> </Configuration> 

Looking at this in terms of regular expression syntax, he also cannot determine if final is “part of a regular expression or its end”. Regex syntax usually solves this problem with an explicit terminator, for example:

 /[^"<]/ 

If users used this syntax (with a terminator), and you wrote a parser for it, you can determine when the regular expression ended, and therefore the next “character” is not part of the regular expression, but is part of XML and therefore what parts should to be shielded. I am not saying that you need it! I say this theoretically, it's pretty far from fast and dirty.

BTW: the same problem occurs for text inside an element. In accordance with the conditions that you have indicated, the following is acceptable:

 <Configuration> <Expression></Expression></Expression> </Configuration> 

The basic rule in the syntax that allows "any text" is that the delimiter must be escaped (for example, "or <)), so that the end can be recognized. In most syntaxes, there are also a bunch of other things, convenience / inconvenience. ( EDIT it will need to run for the escape character itself: for XML it is " & ", which when the literal is escaped as " &amp; ". For regular expression it is C / unix -style " \ " which when the literal is escaped as " \\ ").

Nest syntax, and you're on the run-hell.

One simple solution for you is to tell your users: this is a quick and dirty configuration editor, so you don't get any “no need to run” fantasies "" mamby-pamby:

  • A list of characters and screens to the text area, for example: "<" in the form of " < ".
  • For XML that will not validate, show them the list again.

Looking back, I see that bobince gave me the same basic answer.

+1
source

Inserting CDATA into the entire text will give you another evacuation mechanism that (1) saves users from manual escaping and (2) includes text that was not automatically saved in the text box for proper verification.

  <Configuration> <Validator Expression="<![CDATA[ [^<] ]]>" /> </Configuration> 

:-)

+1
source

This special character - "<" - must be replaced with other characters in order for your XML to be valid. Check this link for special XML characters:

http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references

Try also to encode the contents of the TextBlock before sending it to the deserializer:

 HttpServerUtility utility = new HttpServerUtility(); string encodedText = utility.HtmlEncode(text); 
0
source

Is this really my only option? Isn't it a common enough problem that it has a solution somewhere in the framework?

 private string EscapeAttributes(string configuration) { var lt = @"(?<=\w+\s*=\s*""[^""]*)<(?=[^""]*"")"; configuration = Regex.Replace(configuration, lt, "&lt;"); return configuration; } 

(edit: remote ampersand replacement as it causes roundtripping problems)

0
source

All Articles