...">

Invalid xml character in SQL Insert

I am trying to insert the following line into sql xml field

<?xml version="1.0" encoding="UTF-8"?> <Response> <Ip>xxxx</Ip> <CountryCode>CA</CountryCode> <CountryName>Canada</CountryName> <RegionCode>QC</RegionCode> <RegionName>Québec</RegionName> <City>Dorval</City> <ZipCode>h9p1j3</ZipCode> <Latitude>45.45000076293945</Latitude> <Longitude>-73.75</Longitude> <MetroCode></MetroCode> <AreaCode></AreaCode> </Response> 

The insert code looks like this:

 INSERT INTO Traffic(... , xmlGeoLocation, ...) VALUES ( ... <!--- <cfqueryparam CFSQLType="cf_sql_varchar" value="#xmlGeoLocation#">, ---> '#xmlGeoLocation#', ... ) 

Two bad things happen:

  • Quebec turns into Quebec

  • I get the error [Macromedia][SQLServer JDBC Driver][SQLServer]XML parsing: line 8, character 16, illegal xml character

UPDATE:

The incoming test stream is basically single-byte characters.

é is a double-byte character. In particular, C3A9

Also I have no control over xml input

+6
source share
3 answers

Look at the link from w3, it tells me that:

HTML has a list of some built-in symbol names, such as &eacute; for é, but XML does not have this. There are only five built-in character objects in XML: &lt; , &gt; , &amp; , &quot; and &apos; for <,>, &, "and" respectively. own objects in the definition of the type of document, or you can use any Unicode character (see the next element).

HTML also has numeric character references, such as &#38; for &. You can refer to any Unicode character, but the number will be decimal, whereas in Unicode tables the number is usually in hexadecimal format. XML also allows for hexadecimal references: &#x26; eg.

It makes me think that &#xE9; may work for the é symbol.

Also, the information in this link from Microsoft states that:

SQLXML 4.0 relies on the limited DTD support provided by SQL Server. SQL Server allows an internal DTD in xml data type data that can be used to provide default values ​​and to replace references to objects with their extended content. SQLXML passes the XML data “as is” (including the internal DTD) to the server. You can convert DTDs to XML Schema (XSD) documents using third-party tools and load data using the built-in XSD schemas into the database.

But all this will not help you if you do not control the incoming XML stream. I doubt that you can save é (or any special character, with the exception of the built-in character objects mentioned above) in an XML document in an SQL Server XML field, without adding a DTD or replacing a character using its hexadecimal reference. In both cases, you will need to modify the XML before it enters the database.

Just a quick example for those who want to go down the route of adding a DTD.

Here's how to add an internal DTD to an XML file that declares an entity for the é character:

 <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE root [<!ENTITY eacute "&#233;">]> <root> <RegionName>Qu&eacute;bec</RegionName> </root> 

If you go here and search on the "Ctrl + F" page for "eacute", you will be taken to a list with examples for other characters that you could just copy and paste into your own internal DTD.

Edit

You can disable all objects as indicated in the link above: <!ENTITY eacute "&#233;"><!ENTITY .. // Next entity> or just copy them all from this file . I understand how adding an internal DTD to every single XML file that you add to the database is not such a good idea. I would be interested to know if it fixes your problem for 1 file.

+1
source

I'm going to take off the headline ...

I have the same problem with a funny little apostrophe thing. I think the problem is that by the time the string is converted to XML, it is not UTF-8, but the sql server is trying to use the header to decode it. If it is VARCHAR, then it is in client encoding. If it's NVARCHAR, it's UTF-16. Here are some options I tested:

SQL (varchar, UTF-8):

 SELECT CONVERT(XML,'<?xml version="1.0" encoding="UTF-8"?><t>We're sorry</t>') 

Error:

 XML parsing: line 1, character 44, illegal xml character 

SQL (nvarchar, UTF-8):

 SELECT CONVERT(XML,N'<?xml version="1.0" encoding="UTF-8"?><t>We're sorry</t>') 

Error: XML parsing: line 1, character 38, unable to switch encoding

SQL (varchar, UTF-16)

 SELECT CONVERT(XML,'<?xml version="1.0" encoding="UTF-16"?><t>We're sorry</t>') 

Error:

 XML parsing: line 1, character 39, unable to switch the encoding 

SQL (nvarchar, UTF-16)

 SELECT CONVERT(XML,N'<?xml version="1.0" encoding="UTF-16"?><t>We're sorry</t>') 

Worked!

+9
source

Try changing this:

 <RegionName>Québec</RegionName> 

in

 <RegionName><![CDATA[Québec ]]></RegionName> 
+1
source

All Articles