Understanding jvm elasticsearch heap usage

People,

I am trying to reduce memory usage in my elasticsearch (single node cluster) deployment.

I see that heap of 3 GB JVM memory is used. To optimize, I must first understand the bottleneck. I have a limited understanding of how JVM usage is shared.

Field data seems to consume 1.5 GB, while the filter cache and query cache collectively consume less than 0.5 GB, which adds up to 2 GB at the maximum level.

Can someone help me figure out where elasticsearch eats the remainder of 1 GB?

Marvel screenshot

+8
elasticsearch
source share
2 answers

I can’t say for your fine tuning, but in order to find out what is happening on your heap, you can use the jvisualvm tool (bundled with jdk) along with the miracle or the bigdesk plugin (my preferences) and the _cat API to analyze , what's happening.

As you rightly noted, the heap contains three main caches, namely:

There is pleasant access to thoughts here (Kudos to Igor Kupczyński), which summarizes the roles of caches. This leaves more or less ~ 30% of the heap (1 GB in your case) for all other instances of objects that need to be created by ES in order to work correctly (see more on this later).

This is how I continued my local env. First, I started my node fresh (with Xmx1g ) and was expecting green status. Then I started jvisualvm and connected it to my elastics search process. I took a bunch of heaps from the Sampler tab to subsequently compare it with another dump. My heap looks like this at first (only 1/3 of the maximum heap allocated so far):

enter image description here

I also checked that my field data and filter caches are empty:

enter image description here

Just to make sure, I also ran /_cat/fielddata and, as you can see, there is no heap used by the field data since node started.

 $ curl 'localhost:9200/_cat/fielddata?bytes=b&v' id host ip node total TMVa3S2oTUWOElsBrgFhuw iMac.local 192.168.1.100 Tumbler 0 

This is the initial situation. Now we need to warm this up a bit, so I started using my back-and-front-end applications to put some pressure on the local ES node.

After a while, my heap looks like this, so its size more or less increased by 300 MB (139 MB → 452 MB, not so much, but I conducted this experiment on a small data set)

enter image description here

My caches have also grown slightly to a few megabytes:

enter image description here

 $ curl 'localhost:9200/_cat/fielddata?bytes=b&v' id host ip node total TMVa3S2oTUWOElsBrgFhuw iMac.local 192.168.1.100 Tumbler 9066424 

At this point, I took another heap heap to get an idea of ​​how the heap evolved, I calculated the saved size of the objects, and I compared it with the first dump I took immediately after starting node. The comparison is as follows:

Among the objects that have increased in size, the usual suspects are maps, of course, and any objects related to the cache. But we can also find the following classes:

  • NIOFSDirectory used to read Lucene segment files in the file system
  • Many interned strings as char arrays or byte arrays
  • Corresponding Doc Value Classes
  • Bit settings
  • etc.

enter image description here

As you can see, the heap contains three main caches, but this is also the place where all the other Java objects that the Elasticsearch process requires, and which are not necessarily associated with the cache, are located.

So, if you want to control your use of the heap, you obviously don't have control over the internal objects that the ES must function properly, but you can definitely influence the size of your caches. If you follow the links in the first list of bullets, you get an accurate idea of ​​the settings that you can configure.

Also, setting up caches may not be the only option, you may need to rewrite some of your queries so that they are more memory friendly or change your parsers or some field types in your mapping, etc. It is hard to say in your case without more information, but this should give you some results.

Go ahead and run jvisualvm just like I am here and find out how your heap grows while your application (search + indexing) hits ES, and you should quickly get an idea of ​​what is going on there.

+13
source share

Marvel only displays some instances on the heap, which in this case should be tracked as caches.

Caches are only part of the overall use of the heap. There are many other instances that will occupy the heap memory, and they may not have a direct construction of a miracle on this interface.

Therefore, not all heap occupied in ES is just a cache.

To clearly understand the exact use of the heap by different instances, you need to take a bunch of the process dump and then analyze it using a memory analyzer tool that can provide you with an accurate image.

+1
source share

All Articles