I was looking for a generic method in .Net to encode a string for use in an Xml element or attribute and was surprised when I did not find it right away. So, before I go too much further, can I just skip the built-in function?
Assuming for a moment that it really does not exist, I am building my own generic method EncodeForXml(string data) , and I am thinking about how to do it.
The data I use that caused all of this can contain bad characters like &, <, ", etc. It can also sometimes contain properly shielded objects: &, & lt ;, and", which means that just using the CDATA section might not be the best idea. It looks like klunky anyay; I would rather get a good string value that can be used directly in xml.
I used regex in the past to just catch bad ampersands, and I am thinking of using it to catch them in this case, as well as the first step, and then make a simple replacement for other characters.
So, can this be further optimized without making it too complicated, and is there anything I don't see?
Function EncodeForXml(ByVal data As String) As String Static badAmpersand As new Regex("&(?![a-zA-Z]{2,6};|#[0-9]{2,4};)") data = badAmpersand.Replace(data, "&") return data.Replace("<", "<").Replace("""", """).Replace(">", "gt;") End Function
Sorry for everything that you C # are just people - I donβt care what language I use, but I want to make static Regex and you cannot do it in C # without declaring it outside the method, so it will be VB.Net
Finally, we are still working on .Net 2.0, but if someone can take the final product and turn it into an extension method for the string class, that would be cool too.
Update . The first few answers show that .Net really has built-in ways to do this. But now that I started, I kind of want to finish my EncodeForXml () method just for fun, so I'm still looking for ideas for improvement. Remarkably: a more complete list of characters that should be encoded as entities (possibly stored in a list / map) and something that has better performance than executing .Replace () on immutable lines in a serial interface.