In SQL Server 2012, I have a table with an nvarchar column with a Latin1_General_100_CI_AS_ SC collation, which should support Unicode surrogate characters or extra characters.
When I run this query:
select KeyValue from terms where KeyValue = N'➰'
(above - Unicode SC)
above - curly curly loop symbol with code 10160 (x27B0)
The result is a hundred different similar single-character entries, although they all have different UTF-16 code points. Is it due to sorting? Why is there no exact match?
EDIT: Now I think this has to do with sorting. There seems to be a "undefined" character group in the UTF-16 range, over 1733 characters, and they map to this mapping. Although characters with codes above 65535 are considered unique, these queries return exact matches.
The two queries below have different results:
select KeyValue from terms where KeyValue = N'π'
returns 3 lines: π and π and ᴨ
select KeyValue from terms where KeyValue LIKE N'π'
returns 2 lines: π and π
Why is this?
This is the strangest of all. This request:
select KeyValue from terms where KeyValue like N'➰%'
returns ALMOST ALL records in a table that has many regular Latin characters with multiple characters, such as "8w" or "apple". 90% of those who do not return begin with "æ". What's happening?
NOTE. To give this a little context, these are all Wikipedia articles, not random lines.
source share