Why does casting a VARCHAR UTF-8 column to XML require converting to NVARCHAR and changing the encoding?

I am trying to convert data in a varchar column to XML, but I was getting errors with certain characters. Launching this ...

-- This fails
DECLARE @Data VARCHAR(1000) = '<?xml version="1.0" encoding="utf-8"?><NewDataSet>Test¦</NewDataSet>';
SELECT CAST(@Data AS XML) AS DataXml

... leads to the following error

Msg 9420, Level 16, State 1, Line 3
XML parsing: Line 1, character 55, illegal xml character

It looks like this is an intermittent channel character that causes an error, but I thought it was a valid character for UTF-8. Looking at the XML specification , it looks valid.

When I change it to this ...

-- This works
DECLARE @Data VARCHAR(1000) = '<?xml version="1.0" encoding="utf-8"?><NewDataSet>Test¦</NewDataSet>';
SELECT CAST(REPLACE(CAST(@Data AS NVARCHAR(MAX)), 'encoding="utf-8"', '') AS XML) AS DataXml

... it works without errors (replacing the encoding string with utf-16 also works). I am using SQL Server 2008 R2 with SQL_Latin1_General_CP1_CI_AS Coallation.

- , NVARCHAR encoding="utf-8" ?

,

Edit

, ...

DECLARE @Data VARCHAR(1000) = '<?xml version="1.0" encoding="utf-8"?><NewDataSet>Test¦</NewDataSet>';
SELECT CAST(REPLACE(@Data, 'encoding="utf-8"', '') AS XML) AS DataXml

utf-8 SQL Server.

+1
1

Unicode U+00A6 BROKEN BAR U+007C VERTICAL LINE. U+00A6 ASCII. VARCHAR , ASCII. NVARCHAR , Unicode.

+3

All Articles