Compressing an array of integers in java

I have a very large array of integers that I would like to compress. However, the way to do this in java is to use something like this -

int[] myIntArray; ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(1024); ObjectOutputStream objectOutputStream = new ObjectOutputStream(new DeflaterOutputStream(byteArrayOutputStream)); objectOutputStream.writeObject(myIntArray); 

Note that you must first convert the int array to java bytes. Now I know that this is fast, but still I need to create a whole new array of bytes and scan the entire source array int, converting it to bytes and copying the value to a new byte array.

Is there a way to skip byte conversion and make it compress integers right away?

+2
source share
6 answers

Skip the ObjectOutputStream and just save the int directly as four byte . DataOutputStream.writeInt , for example, is an easy way to do this.

+4
source

Hm. A general purpose compression algorithm will not necessarily do a good job of compressing an array of binary values ​​unless there is a lot of redundancy. You might be better off developing something of your own based on what you know about data.

What are you trying to squeeze?

+2
source

You can use the view used by the protocol buffers . Each integer is represented by 1-5 bytes depending on its size.

In addition, the new “packaged” view means that you basically get a little “header” to say how big it is (and in which field it is), and then only the data. Probably what ObjectOutputStream does, but this is a recent innovation in PB :)

Note that this will shrink depending on the size, and not on how often an integer is seen. This will greatly affect whether it is useful to you or not.

+2
source

A byte array will not save you a lot of memory unless you make it a byte array containing unsigned ints, which is very dangerous in Java. This will replace the memory overhead with additional processing time to verify the code step. This may be right for storing data, but there is already a solution for storing data.
If you are not doing this for serialization, I think you are wasting your time.

0
source

If the ints array has no duplicates, you can use java.util.BitSet instead.

Since its basic implementation is an array of bits, with each bit indicating whether any integer is present or not in the BitSet, its memory usage is quite low, so less space is required for serialization.

0
source

In your example, you are writing a compressed stream to a ByteArrayOutputStream. Your compressed array must exist somewhere, and if the goal is memory, then ByteArrayOutputStream is your likely choice. You can also write a stream to a socket or file. In this case, you will not duplicate the stream in memory. If your array is 800 MB and your computer runs at 1 GB, you can easily write the array to a compressed file with the example you provided. This change will replace ByteArrayOutputStream with the file stream.

The ObjectOutputStream format is actually quite efficient. It will not duplicate your array in memory and has special code for efficiently writing arrays.

Want to work with a compressed array in memory? Will you use data for a sparse array? A rare array is good when you have large gaps in your data.

0
source

All Articles