Yes, I think so. Characters are probably stored as wide or UCS2 strings. They can be UTF-16, in which case they take two words (16-bit integers) per character for characters outside the BMP (Basic Multilingual Plane), but I believe that these characters are not fully supported. Read this blog post about problems implementing UTF16 ECMA.
Most modern languages ββstore their strings with two byte characters. Thus, you have full support for all spoken languages. It costs a little extra memory, but it is a peanut for any modern computer with multi-core RAM. Storing a string in a more compact UTF8 will cause more complex and slower processing. UTF8 is therefore mainly used for transportation. ASCII only supports the Latin alphabet without diacritics. ANSI is still limited and needs a specific code page to make sense.
Section 4.13.16 of ECMA-262 explicitly defines "String value" as "a primitive value that is a finite ordered sequence of zero or more 16-bit unsigned integers." This suggests that programs use these 16-bit values ββas UTF-16 text, but it is legal to simply use a string to store any immutable array of unsigned shorts.
Note that character size is not the only thing that makes up the size of a string. I don't know about the exact execution (and this may vary), but strings tend to have a 0x00 terminator to make them compatible with PChars. And they probably have a heading that contains the size of the string and possibly some recount information and even an encoding. A single-character string can easily consume 10 bytes or more (yes, it's 80 bits).
Goleztrol
source share