I think itβs nice to start with scan requests using api scroll:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html
It basically looks like a cursor with a database - you open a query with a time limit and returns the scroll identifier. Then you use this scroll id to get the first batch of results and return the documents along with the new scroll id. Examples below:
curl -XGET 'localhost:9200/_search?search_type=scan&scroll=10m&size=1000' -d ' { "query" : { "match_all" : {} } } '
This will return _scroll_id, which is then used to retrieve documents:
curl -XGET 'localhost:9200/_search/scroll?scroll=10m' -d '<_SCROLL_ID_HERE>'
Note that this will return 1000 PUR PRIMARY SHARD documents - so if you have 4 primary shards, it will return 4000 documents. Each call in addition to documents returns a new _scroll_id, which is then used for the next call. "Scroll = 10m" sets a time limit of 10 m so that scrolling opens between calls.
John petrone
source share