Does nvarchar always keep each character in two bytes?

I (perhaps naively) suggested that in SQL Server nvarchar would store each character in two bytes. But it is not always the case. The documentation there states that some characters may occupy more bytes. Does anyone have a definitive answer?

+7
source share
3 answers

yes, it uses 2 bytes, uses datalength to get the storage size, you cannot use LEN because LEN just counts the characters, see here: Differences between LEN and DATALENGTH in SQL Server

DECLARE @n NVARCHAR(10) DECLARE @v VARCHAR(10) SELECT @n = 'A', @v='A' SELECT DATALENGTH(@n),DATALENGTH(@v) --------- 2 1 

Here's what On Line books have: http://msdn.microsoft.com/en-us/library/ms186939.aspx

Types of personal data that are either fixed length, nchar or variable length, nvarchar, Unicode data and use the UNICODE UCS-2 character set.

nchar [(n)]

Unicode fixed-length character data of n characters. n must be a value from 1 to 4000. The storage size is two bytes. The ISO synonyms for nchar are national char and national character.

nvarchar [(n | max)]

Unicode character variable data length. n can be a value from 1 to 4000. max indicates that the maximum storage size is 2 ^ 31-1 bytes. storage size in bytes - twice the number of entered characters + 2 bytes. Entered data can be 0 characters in length. ISO synonyms for nvarchar are national char variables and a national character.

This said that Unicode compression was introduced in SQL Server 2008 R2, so it can store ascii as 1 byte, you can read about unicode compression here http://sqlblog.com/blogs/aaron_bertrand/archive/2009/08/11/ sql-server-2008-r2-a-quick-experiment-in-unicode-compression.aspx

+13
source

Given that there are more than 65,536 characters, it should be obvious that a character cannot fit in two octets (i.e. 16 bits).

SQL Server, like most Microsoft products (Windows, .NET, NTFS, & hellip;), uses UTF-16 to store text in which a character takes two or four octets, although, as @SQLMenace points out, current versions of SQL Server use compression to reduce this.

+4
source

My understanding of this problem is that the SQL server uses UCS-2 internally, but its UCS-2 implementation was hacked to support a subset of characters up to 4 bytes in the GB18030 character set , which are saved as UCS-2, but transparently converted when requested multi-byte character database engine.

Surrogate / additional characters are not fully supported - the implementation of a number of string functions of the SQL server does not support surrogate pairs, as detailed here .

+4
source

All Articles