Looping through all documents in the elasticsearch index

Question

Looping through all documents in the elasticsearch index

Using the Elasticsearch javascript client (node.js), what is the best (or easiest) way to scroll through each document in the index (about 100,000 documents)?

+7

elasticsearch

user1612947 May 24, '14 at 13:23

source share

1 answer

John petrone · Accepted Answer · 2014-05-24T18:34:54+0000

I think it’s nice to start with scan requests using api scroll:

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html

It basically looks like a cursor with a database - you open a query with a time limit and returns the scroll identifier. Then you use this scroll id to get the first batch of results and return the documents along with the new scroll id. Examples below:

curl -XGET 'localhost:9200/_search?search_type=scan&scroll=10m&size=1000' -d ' { "query" : { "match_all" : {} } } '

This will return _scroll_id, which is then used to retrieve documents:

 curl -XGET 'localhost:9200/_search/scroll?scroll=10m' -d '<_SCROLL_ID_HERE>'

Note that this will return 1000 PUR PRIMARY SHARD documents - so if you have 4 primary shards, it will return 4000 documents. Each call in addition to documents returns a new _scroll_id, which is then used for the next call. "Scroll = 10m" sets a time limit of 10 m so that scrolling opens between calls.

Looping through all documents in the elasticsearch index

More articles: