What does -1 refresh_interval mean in Elasticsearch?

I read many articles about updating indexes in Elasticsearch. I understand that the implication of different intervals is greater than 0, this is the elapsed time between consecutive segments, which makes them searchable. However, I'm not sure what exactly refresh_interval: -1 does refresh_interval: -1 . In my opinion, this is a tool to turn off automatic index updates, but not completely. From time to time, Elicsearch erases segments all the time, even if refresh_interval set to -1. I wonder what mechanism controls this flushing activity if auto-update is disabled.

Sorry, I know that I don’t have a lot of code to publish, so I’ll talk a little bit about what I do. My application does not need a real-time search; it only requires constant consistency. However, this circumstance should be reasonable, that is, within a few seconds to less than a minute, and not after half an hour. I was wondering if I could leave it in Elasticsearch to decide when it is better to update it, rather than updating it at regular intervals. The reason is that disabling automatic updates does bring some performance benefits to my application, for example. Using JVM Heap Size grows less aggressively between garbage collection intervals (see graph below)

After disabling the update interval, heap usage becomes less aggressive

+7
elasticsearch
source share
2 answers

There is a bit of confusion in your understanding. Updating the index and writing to disk are two different processes and are not necessarily related, so your segment monitoring is still being written, even if refresh_interval is -1.

When a document is indexed, it is added to the memory buffer and added to the translog file. When an update occurs, documents in the buffer are written to a new segment, without fsync , the segment is opened to make it searchable, and the buffer is cleared. The broadcast has not yet been cleared , and in fact, nothing is saved to disk (since there was no fsync ).

Now imagine that the update does not happen: there is no index update, you cannot search for your documents, segments are not created in the cache.

The settings here will determine when a reset occurs (write to disk). By default, when the translator reaches 512 MB or in 30 minutes. This is actually data stored on disk , everything else is in the file system cache (if the node dies or the computer reboots, the cache is lost, and translation is the only salvation).

+9
source share

By default, index.refresh_interval is 1 s. In fact, this can be called an expensive operation in ES, especially when indexing. You may notice that with an increase refresh_interval.

By setting index.refresh_interval to -1, you disable it, and this can give you significant gains when indexing in ES. You just need to disable refresh_interval (enable it again when you finish indexing the data)

 curl -XPUT "http://localhost:9200/$INDEX_NAME/_settings" -d '{ "index" : { "refresh_interval" : "-1" }}' #index data...... curl -XPUT "http://localhost:9200/$INDEX_NAME/_settings" -d '{ "index" : { "refresh_interval" : "1s" }}' 

And after indexing, you can set the appropriate value according to your requirement to ensure consistency. Useful article: https://sematext.com/blog/2013/07/08/elasticsearch-refresh-interval-vs-indexing-performance/

Hope this helps!

+1
source share

All Articles