Is encapsulating strings as byte [] to save excess memory? (Java)

I recently looked at Java Swing code and saw the following:

byte[] fooReference; String getFoo() { returns new String(fooReference); } void setFoo(String foo) { this.fooReference = foo.getBytes(); } 

The above can be useful to keep in your picture in memory or, as I was told.

Is it redundant, someone else encapsulating their strings this way?

+6
java string memory-management oop byte
source share
8 answers

This is a really, really bad idea. Do not use the default encoding for the platform. There is nothing to say that if you call setFoo and then getFoo , you will get the same data.

If you should do something like this, use UTF-8, which can represent all Unicode for a specific ... but I really haven't done it. This potentially saves some memory, but due to the fact that most of the time is performed without the need for conversions, and is prone to errors in terms of refusing to use the appropriate encoding.

I dare say that there are some applications where this would be appropriate, but for 99.99% of them this is a terrible idea.

+31
source share

This is not very useful:
1. You copy a line every time getFoo or setFoo is called, so both CPU and memory usage increase
2. It is unclear

+10
source share

A small historical tour ...

Using byte arrays instead of String objects, which are actually used to get some significant advantages in the early days of Java (1.0 / 1.1), if you can be sure that you will not need anything outside of ISO-8859-1. With virtual machines of that time, it was more than 10 times faster to use drawBytes () compared to drawString (), and in fact it saves memory, which at that time was still very scarce, and applets usually had a hard-coded memory barrier 32 and later 64 MB anyway. Not only is byte [] smaller than the built-in char [] object for String objects, but you can also save a relatively heavy String object that really mattered if you had a lot of short strings. In addition, accessing an array of simple bytes is also faster than using String access methods with all their control restrictions.

But since drawBytes are no longer faster in Java 1.2, and since the current JITs are much better than the Symantec JITs of the time, the remaining minimal performance advantage of byte [] arrays per row is no longer a hassle. The advantage of memory still exists, and thus, it may be an option in some very rare extreme scenarios, but at the moment this is nothing to consider if it is not really needed.

+5
source share

This may be redundant, and it may even consume more memory, since now you have two copies of the string. How long the actual line of life depends on the client, but, like in many such hacks, it smells a lot, like premature optimization.

+3
source share

If you expect that you will have many identical strings, another much better way to save memory is String.intern () .

+3
source share

Each call to getFoo () creates an instance of a new line. How is it saving memory? If something adds extra overhead for your garbage collector to go and clear these new instances when these new links become unavailable

+2
source share

It really doesn't make any sense. If it were a compile-time constant that you didn't need to pull back to String , then that would make a little more sense. You still have a character encoding problem.

It would be more reasonable to me if it were constant char[] . In the real world, there are several JSP compilers that optimize String constants in char[] , which, in turn, can be easily written in Writer#write(char[]) . This is ultimately a bit more efficient. but these little cue ball are a lot of interesting in large and heavily used applications such as Google Search, etc.

The Tomcat JSP Jasper compiler does this as well. Check the setting of genStringAsCharArray . So he does it

 static final char[] text1 = "some static text".toCharArray(); 

instead

 static final String text1 = "some static text"; 

which ends up with less overhead. These characters do not require an entire String instance.

+2
source share

If, after profiling the code, you find that using memory for strings is a problem, you are much better off using a shared jet compressor and storing compressed strings, rather than trying to use UTF-8 strings for a slight reduction in space, they give you. With English language strings, you can compress them to 1-2 bits per character; most other languages ​​are probably similar. Getting <1 bits per character is tough, but possible if you have a lot of data.

+1
source share

All Articles