Downloading 26 MB of text data from a database consumes a bunch of JVM 258 MB

Question

Downloading 26 MB of text data from a database consumes a bunch of JVM 258 MB

An application (Spring, JPA Hibernate, Sybase 12, Webapp), when launched locally at startup, consumes 40 MB of a 256 MB heap of memory based on VisualVM. When I run a search that returns 70,000 + rows (text data without a drop), the heap area graph shoots up to 256 MB and dumps it from memory. I solved this with setMaxResults (limit). However, when I requested the same data, copied it to a text file and saved it in the file system, I see that the size is only 26 MB of text.

Thus, 216 MB (from 256 to 40) is consumed by downloading 26 MB of text from the databases , which is consuming 190 MB by the time the memory runs out? It might have been a framework, but I don’t see how it can consume more than the actual download data ...

** Remember once again that I resolved this with setMaxResults (limit), my question is NOT what needs to be done, but rather why for educational purposes.

+4

java sybase jvm out-of-memory hibernate

Carlos Jaime C. De Leon Jul 27 '12 at 3:16

source share

2 answers

All kinds of things.

Consider, for example, that your rows have 10 text columns, which are represented as a simple Java Bean with 10 string fields.

A String has 4 fields: a char [] and 3 ints.

A String is a descendant of Object that has 1 int and a reference to its class.

On a 64-bit JVM, these links can be 8 bytes (but not necessarily, but we will stick to this for the sake of argument).

A 10-character string will have char [10] and 3 spaces, each of which has 4 bytes.

char [10] - a pointer to an array. The array should track its length, which is probably another 4 bytes, as well as an object (thus a class pointer and another int) plus data. But characters in Java are represented as UTF-16 internally, 2 bytes per character. So the actual array for 10 characters takes 24 bytes. And the reference to this array is a pointer.

So, one instance of String: 8 + 4 for Object, 8 + 4 + 4 + 4 for the string itself and 8 + 4 + 20 for actual data or 62 bytes.

Your Bean has 10 string fields, and the Object object is also expanded, so 8 + 4 + (10 * 8).

So, one row from your database, for 100 characters of text, is 8 + 4 + (10 * 8) + (10 * 62), which equals 712 bytes.

These are not ideal numbers, I can’t talk specifically about how arrays are stored, and object references may not be 8 bytes on a 64-bit JVM.

But this gives you some idea of the overhead associated with this. And this is just for your raw data. If you have these lines stored in an ArrayList, then there 70,000 * 8 just point to your objects - 560K just for structure.

+2

Will hartung Jul 27 '12 at 3:45

source share

Affe · Accepted Answer · 2012-07-27T03:33:10+0000

Some things to consider:

Your operating system probably uses 8-bit encoding to store a text file. Java strings are internally encoded at 16 bits per character, double the space right there.

Numbers with multiple digits will be less encoded as text than numbers. for example, '1' is a single-byte character in your text file, but a long one with a value of 1 is eight times that size in memory.

There will be duplication from sleep mode, taking values from the SQL result set and matching it with your Java objects. You may need to wrap / translate the contents of the result set into the types you define when displaying.

If your data-per-entity is actually small with a large number of objects, then the ratio of the size of the service data of the object to the size of the data will obviously be high.

If you have small pieces of data in collections, the size of the collection can add up quickly relative to the data. In the extreme example, if you have a LinkedList of one or two characters, then 192 bits are consumed only by pointers for every 16-32 bits of actual data. There will still be 64 bits in the array list so that the pointer points to 16-32 bits of data. (assuming a 64-bit OS, of course.)

Each object that you put into sleep mode is “tracked” to dirty check what is called the L1 cache. In fact, there can be quite a lot of overhead for the internal data structures and tools used for this, relative to the size of the data for a large number of objects with a small amount of data.

-

Thus, 26 MB of data is already 52 MB of data in memory in java, assuming that these are all lines, numbers, dates, otherwise it will be more.

And then, if it is divided into many small pieces, 700,000 small lines, rather than 1000 really long, it is quite reasonable that the size of the data structure's service data will be three times the actual data, which would make it easier for you to more than 200 MB.

Downloading 26 MB of text data from a database consumes a bunch of JVM 258 MB

More articles: