Is it possible for SQL Server to convert a sort to UTF-8 / UTF-16

In the project I'm working on, my data is stored in SQL Server with the Danish_Norwegian_CI_AS . Data is output via FreeTDS and ODBC, in python, which processes the data as UTF-8. Some of the characters, such as å, ø and æ, are incorrectly encoded, which leads to a halt in the progress of the project.

I spent a couple of hours reading about the confusing world of encodings, comparisons, and code pages, and it seemed to me that I better understood the whole picture.

Some of the articles I read make me think that it would be possible: to indicate in the SQL select statement that the matching data should be encoded in UTF-8 when it exits.

I think it's possible, this is this article , which shows an example of how to get to tables with different mappings to play well together.

It would be helpful to evaluate any pointers in the direction of converting the sort to UTF-8 / UTF-16!

EDIT: I read that SQL Server provides the unicode parameter via nchar , nvarchar and ntext , and the rest of the string variables char , varchar and text are encoded according to the reconciliation set. I also read that the above unicode parameters are encoded in the utf-16 ucs-2 variant (I hope I remember that right). So; in order to allow localization and unicode tables, play well, there must be a conversion function, no?

+8
sql-server unicode utf-8 collation pyodbc
source share
2 answers

After 4 months, I finally found the answer to my problem. It turns out that it has nothing to do with the FreeTDS driver or with the database setup:

It was a podbc connection function, which apparently requires a flag; unicode_results=True

Placed here to help other unhappy doomed ships wander aimlessly in the dark, looking for a clue.

+10
source share

It seems that SQL does not support UTF-8 (see here ), but you can try changing the sorting in select, for example:

 SELECT Account COLLATE SQL_Latin1_General_CP1_CI_AS from Data 

You can also remove accents using this solution: How to remove accents and all the characters <> a..z in a sql server?

Another solution might be to pass your column to nvarchar

 SELECT cast (Account as nvarchar) as NewAccount from Data 

where Account is varchar in your initial table.

If, for example, you try:

 SELECT cast(cast(N'ţ' as varchar) as nvarchar) 

the end result is "ţ"

+3
source share

All Articles