We have an application that takes a text string entered by the user into a web form and wraps it in XML. To confuse issues a bit, XML is sent as the body of an Outlook email message.
Because users can embed almost anything in a web form (usually from Word), the text string may contain non-ASCII characters (7 bits), such as those used for open and closed double quotes.
The string moves unchanged by email, but when we use the Microsoft XML parser, it complains (quite rightly) that the XML contains invalid characters.
A quick fix is to put the encoding = "iso-8859-1" in the header. However, interestingly, it would be better to encode the XML file in true UTF-8 format at the beginning, as I read articles that said it would be better for a more harmonious world if every XML document was encoded in UTF-8
But ... we will have problems because the XML document is actually transmitted through the body of the email message? I understand that UTF-8 is a variable byte encoding system. I suggest using 7-bit ASCII and escapte characters to indicate “more data”.
Another option is to install UTF-8, but replace non-ASCII characters with & #nnn; Format.
Any advice on this rather complex area is appreciated.
Greetings Rob.