A filter is not a bad option if you are already indexing the appropriate timestamp. You must track this client-side timestamp in order to properly prepare your requests. You should also know when to get rid of it. But these are not insurmountable problems.
The scroll API is a reliable option for this, as it effectively takes snapshots on time on the Elasticsearch side. The purpose of the scroll API is to provide a stable search query for deep pagination, which should deal with the exact change problem that you are experiencing.
You start scrolling the search by providing your query and the scroll parameter, for which Elasticsearch returns scroll_id . Then you send requests to /_search/scroll , supplying this identifier, each of which returns a results page and a new scroll_id for the next request.
(Note that the scan search type is not needed here. This is used to extract documents in bulk and does not apply any sorting.)
Compared to filtering, you still need to keep track of the value: scroll_id for the next page of results. Or itβs easier than tracking the timestamp, it depends on your application.
There are other potential disadvantages. Elasticsearch maintains the context for your search in a single node within the cluster. Apparently, they can accumulate in your cluster, depending on how much you rely on the search scroll. You will want to check the impact on performance. And if I remember correctly, scrolling searches are also not saved when a node fails or reboots.
The ES documentation for the scroll API contains details on all of the above.
Bottom line: filtering by timestamp is actually not a bad choice. The scroll API is another valid option designed for a similar use case, but not without its drawbacks.
source share