ElasticSearch documentation says that you cannot use scrolling for user queries, only for data conversion

I am new to ES and confused by my scroll documentation. From the documents β€œScrolling is not intended for user requests in real time, but is intended for processing large amounts of data, for example, for overriding the contents of one index into a new index with a different configuration.”

And yet ... further on the same page it says that you cannot use from () and size () to paginate, because it is "very inefficient." The Java API page that describes the search shows an example of paging by scrolling.

So, if I want to present the sorted search results, page at a time, which approach is recommended: from / size or scroll?

+7
elasticsearch
source share
3 answers

from/size very inefficient if you want to do a deep pagination or want to request a lot of results per page.

The reason is that the results are sorted first on each shard, and all these results are then collected, combined, and sorted by the node query coordinator. This is becoming more expensive as pages grow in either size or rank. You will find a very good example described here .

You can limit the size of your users' requests (for example, with something like ~ 1000 results), and it will be convenient for you to use from/size .

If this is not an option, you can still use scrolling , but you will lose some functions, such as clustering and saving the search context, the living has a cost .

+3
source share

Both scrolling and size / size suffer from deep pagination. You can try the hybrid approach by paginating in larger steps (e.g. 100 records at a time), but show the UI in smaller batches (i.e. only 10). As the user continues to go to the pages, at some point you should start another background search task for the next batch while the user is busy. If you track these sessions and get a general idea of ​​how deeply users search, you can find your ideal size and scroll size in this number of steps.

Between the two, I had better scrolling experience than off / size in terms of response time, but YMMV. Suitable for your data, setting up fragments, etc.

+2
source share

You can use search_after . The main flow of the process will be like this:

  • Perform a regular search to return an array of sorted document results by date.
  • Run the following query with the search_after field in the body to tell Elasticsearch to only return documents after the specified document (date).

In this way, your results remain reliable with respect to any updates or deletions of documents and remain accurate. You also avoid scrolling costs (as you probably already read), and from / size the linear execution time for each request, starting with your original document result.

See docs for more details.

+1
source share

All Articles