C #: What takes up more memory? String or bytearray?

Question

C #: What takes up more memory? String or bytearray?

C #: What takes up more memory? String or bytes?

Say I have a line that reads "My text", in what form will this line use more memory as a byte or a line?

+7

string c # .net bytearray

Michael robinston May 26, '09 at 22:15

source share

7 answers

It depends on the character encoding of the byte array. You can convert any string to an array of bytes, but you need to choose an encoding; There is no single standard or correct coding. What used to be called ASCII is useless outside the English-speaking world.

In most encodings, "My Text" will be 7 bytes long. But drop some accented characters in Europe or Japanese characters, and those (if they can be represented at all) can have more than one or two bytes each. In some encodings with some text strings, the byte array representation may be larger than the internal Unicode representation used by System.String .

+12

Daniel Earwicker May 26, '09 at 22:28

source share

Being Unicode does not mean that a string will contain more than one byte per character, it simply means that it "can" accept more than one byte per character.

http://www.joelonsoftware.com/articles/Unicode.html

+2

Robin day May 26, '09 at 22:28

source share

Both are pretty close. Only one real answer:

Profile it in your structure / architecture.

+1

John gietzen May 26, '09 at 22:18

source share

What takes up more memory?

So, you are asking about the size of the view in memory ..net uses UTF-16 for strings, which means that your example will be represented by 14 bytes, as seen in this hex dump (UTF-16LE):

 4d 00 79 00 20 00 54 00 65 00 78 00 74 00

The size of the byte array will depend on the encoding you use to represent the text. If you are using UTF-16 , for example

 Encoding.Unicode.GetBytes(string)

you obviously get the same 14 bytes. If you are using UTF-8 :

 Encoding.UTF8.GetBytes(string)

you will get an array of 7 bytes:

 4d 79 20 54 65 78 74

This is the same size (and same representation) as ASCII , because your example uses only ASCII encoded characters. All of these characters are, by definition, the same in UTF-8.

Now, if you use non-ASCII characters , say Japanese "日", 3 bytes are required for UTF-8 encoding:

 e6 97 a5

UTF-16 only needs 2 bytes:

 e5 65

Attempting to convert a Japanese character to ASCII will throw an exception or just use "?" depending on how you configure Encoding , because ASCII cannot represent anything other than ASCII characters.

Another slightly different example is the European character "ä". 2 bytes in UTF-8:

 c3 a4

UTF-16 also has 2 bytes:

 e4 00

ASCII cannot represent this character.

To summarize, the memory consumed depends on the actual data in your lines and what encoding you use to represent it .

All of the above says about memory consumption for raw data , note that to calculate the total memory consumption, you also need to include metadata strong>, which is part of each array and row, for example, its length , and in the case of .net-lines - also a null terminator (2 additional bytes with a value of "0"). The number of bytes for the metadata is constant and relatively small, so any difference between a string and an array will only matter if you have tons of very small texts.

+1

Eugene beresovsky Oct 30 '12 at 5:20

source share

A byte array will take up less memory if you do not have multiple copies of the string, in which case the string will take up less memory thanks to the string table.

But the real questions are: does it really matter? There are many advantages that you get from using a string as a string, rather than storing it as an array of bytes.

I don’t know the details, since your question was very narrow, but I smell the premature optimization.

0

Randolpho May 26, '09 at 22:18

source share

There is a good blog entry here that gives an equation on how much space a string occupies, as well as various interactions with StringBuilder and instance distribution

0

thecoop May 26, '09 at 22:27

source share

Elliot hugs · Accepted Answer · 2009-05-26T22:20:37+0000

An array of bytes. This will save your text as ASCII characters (1 byte per character), while the .NET string uses Unicode, which are larger. However, keep in mind that .NET strings are probably more useful, and in a large application, the difference is probably not going to make much difference.

(note also that if you just use ASCII characters in your .NET string, the characters will still only contain 1 byte)

C #: What takes up more memory? String or bytearray?

More articles: