How much RAM does each character store in an ECMAScript / JavaScript string?

The question is quite simple: how much RAM in bytes does each character in the ECMAScript / JavaScript string consume?

Am I going to guess two bytes as the standard says they are stored as unsigned 16-bit integers?

Does this mean that each character is always two bytes?

+4
javascript string
source share
1 answer

Yes, I think so. Characters are probably stored as wide or UCS2 strings. They can be UTF-16, in which case they take two words (16-bit integers) per character for characters outside the BMP (Basic Multilingual Plane), but I believe that these characters are not fully supported. Read this blog post about problems implementing UTF16 ECMA.

Most modern languages ​​store their strings with two byte characters. Thus, you have full support for all spoken languages. It costs a little extra memory, but it is a peanut for any modern computer with multi-core RAM. Storing a string in a more compact UTF8 will cause more complex and slower processing. UTF8 is therefore mainly used for transportation. ASCII only supports the Latin alphabet without diacritics. ANSI is still limited and needs a specific code page to make sense.

Section 4.13.16 of ECMA-262 explicitly defines "String value" as "a primitive value that is a finite ordered sequence of zero or more 16-bit unsigned integers." This suggests that programs use these 16-bit values ​​as UTF-16 text, but it is legal to simply use a string to store any immutable array of unsigned shorts.

Note that character size is not the only thing that makes up the size of a string. I don't know about the exact execution (and this may vary), but strings tend to have a 0x00 terminator to make them compatible with PChars. And they probably have a heading that contains the size of the string and possibly some recount information and even an encoding. A single-character string can easily consume 10 bytes or more (yes, it's 80 bits).

+8
source share

All Articles