Understanding jvm elasticsearch heap usage

Question

Understanding jvm elasticsearch heap usage

People,

I am trying to reduce memory usage in my elasticsearch (single node cluster) deployment.

I see that heap of 3 GB JVM memory is used. To optimize, I must first understand the bottleneck. I have a limited understanding of how JVM usage is shared.

Field data seems to consume 1.5 GB, while the filter cache and query cache collectively consume less than 0.5 GB, which adds up to 2 GB at the maximum level.

Can someone help me figure out where elasticsearch eats the remainder of 1 GB?

+8

elasticsearch

Nullpoet Mar 2 '16 at 8:19

source share

2 answers

Marvel only displays some instances on the heap, which in this case should be tracked as caches.

Caches are only part of the overall use of the heap. There are many other instances that will occupy the heap memory, and they may not have a direct construction of a miracle on this interface.

Therefore, not all heap occupied in ES is just a cache.

To clearly understand the exact use of the heap by different instances, you need to take a bunch of the process dump and then analyze it using a memory analyzer tool that can provide you with an accurate image.

+1

Rahul Mar 10 '16 at 3:55

source share

Val · Accepted Answer · 2016-03-10T09:10:12+0000

I can’t say for your fine tuning, but in order to find out what is happening on your heap, you can use the jvisualvm tool (bundled with jdk) along with the miracle or the bigdesk plugin (my preferences) and the _cat API to analyze , what's happening.

As you rightly noted, the heap contains three main caches, namely:

field cache file : unlimited by default, but it can be controlled using indices.fielddata.cache.size (in your case, this is about 50% of the heap, probably due to fielddata circuit breaker )
node request / filter cache : 10% of the heap
cache request cache : 1% heap, but disabled by default

There is pleasant access to thoughts here (Kudos to Igor Kupczyński), which summarizes the roles of caches. This leaves more or less ~ 30% of the heap (1 GB in your case) for all other instances of objects that need to be created by ES in order to work correctly (see more on this later).

This is how I continued my local env. First, I started my node fresh (with Xmx1g ) and was expecting green status. Then I started jvisualvm and connected it to my elastics search process. I took a bunch of heaps from the Sampler tab to subsequently compare it with another dump. My heap looks like this at first (only 1/3 of the maximum heap allocated so far):

I also checked that my field data and filter caches are empty:

Just to make sure, I also ran /_cat/fielddata and, as you can see, there is no heap used by the field data since node started.

 $ curl 'localhost:9200/_cat/fielddata?bytes=b&v' id host ip node total TMVa3S2oTUWOElsBrgFhuw iMac.local 192.168.1.100 Tumbler 0

This is the initial situation. Now we need to warm this up a bit, so I started using my back-and-front-end applications to put some pressure on the local ES node.

After a while, my heap looks like this, so its size more or less increased by 300 MB (139 MB → 452 MB, not so much, but I conducted this experiment on a small data set)

My caches have also grown slightly to a few megabytes:

 $ curl 'localhost:9200/_cat/fielddata?bytes=b&v' id host ip node total TMVa3S2oTUWOElsBrgFhuw iMac.local 192.168.1.100 Tumbler 9066424

At this point, I took another heap heap to get an idea of how the heap evolved, I calculated the saved size of the objects, and I compared it with the first dump I took immediately after starting node. The comparison is as follows:

Among the objects that have increased in size, the usual suspects are maps, of course, and any objects related to the cache. But we can also find the following classes:

NIOFSDirectory used to read Lucene segment files in the file system
Many interned strings as char arrays or byte arrays
Corresponding Doc Value Classes
Bit settings
etc.

As you can see, the heap contains three main caches, but this is also the place where all the other Java objects that the Elasticsearch process requires, and which are not necessarily associated with the cache, are located.

So, if you want to control your use of the heap, you obviously don't have control over the internal objects that the ES must function properly, but you can definitely influence the size of your caches. If you follow the links in the first list of bullets, you get an accurate idea of the settings that you can configure.

Also, setting up caches may not be the only option, you may need to rewrite some of your queries so that they are more memory friendly or change your parsers or some field types in your mapping, etc. It is hard to say in your case without more information, but this should give you some results.

Go ahead and run jvisualvm just like I am here and find out how your heap grows while your application (search + indexing) hits ES, and you should quickly get an idea of what is going on there.

Understanding jvm elasticsearch heap usage

More articles: