UTF-8 String Class for java

I need to hold many string objects in memory (hundreds of MB), and I want to save them in UTF-8 format, since in most cases this will require half the default memory.
The String class, by default, requires 60 bytes for a 12-character string (see http://blog.griddynamics.com/2010/01/java-tricks-reducing-memory-consumption.html ).
Most of my lines are 10-20 characters long.
I wonder if there is some open source library that offers a wrapper for such strings?
I know how to convert String to a UTF-8 byte array, but I'm looking for a wrapper class that will provide all the necessary utility functions (Hash, Equal, toString, fromString, etc.).

+6
source share
2 answers

Apache Avro has a UTF8 wrapper class that implements CharSequence , but I don’t know how memory consumption by such objects

Hadoop has a text class that has an absolutely necessary interface.

+2
source

If you need a separate object for each row, and you want them to be as compact as possible, use byte arrays. This will be 1 byte per char versus 2, and you will not have the overhead of the String header (which adds probably 32 bytes per object).

But, of course, you cannot use any String methods on them without first converting to String.

But if you really want to save space, save the lines back to each other in several large arrays, with "doping vectors" to find individual lines.

0
source

All Articles