Different Elicsearch search results for the same query

I installed Elasticsearch with 1 cluster 4 nodes. The number of fragments per index: 1; Number of replicas per index: 3

When I call a simple query, for example, the following one time, I get different results (different general hits and different top 10 documents):

http://localhost:9200/index_name/_search?q=term 

Different data for each shard? I like that I have all the fragments. What can I do?

This is the result of / _cluster / health:

 { "cluster_name" : "secret", "status" : "green", "timed_out" : false, "number_of_nodes" : 4, "number_of_data_nodes" : 4, "active_primary_shards" : 24, "active_shards" : 96, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0 } 

As a workaround, I am rebuilding the index using the Ruby gem bus: ModelName.rebuild_index

But I need a long-term solution.

+8
elasticsearch
source share
3 answers

This is because you did not specify sort order and size . Therefore, every time you request, you get random first 10 records by default size for the elasticsearch 10 server to set the result set.

You can add sorting as follows using curl,

 curl -XPOST 'localhost:9200/_search' -d '{ "query" : { ... }, "sort" : [ {"price" : {"order" : "asc", "mode" : "avg"}} ] }' 

Check here for more information specifically from and size with the sorting that is most often used for pagination.

update:

Although the default collation, score DESC sometimes does not work when records do not have the corresponding _score, like http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_sorting.html#_sorting

+2
source share

This question helped me, as the answer says,

One of the possible reasons could be the distribution of IDFs, by default Elastic uses a local IDF on each shard to preserve some performance that will lead to different idfs in the cluster.

Es doc here

0
source share

We ran into a similar problem, and it turned out that Elasticsearch is looping through the various fragments in the search. Each shard returns slightly different _score values ​​due to slightly different indexing due to different documents. In our case, this meant that similar results were often placed slightly lower or higher in the order of the results, and in combination with pagination (using from and size in the search query), this meant that the same results appeared on two separate “pages” "or not at all from page to page.

We found an Elasticsearch article on sequential evaluation that explains this fairly accurately and implemented the preference parameter to ensure that we always get the same ratings for a particular search by querying the same shards:

 http://localhost:9200/index_name/_search?q=term&preference=blablabla 

We also thought about using sorting, but Elasticsearch sorts the results with the same score by the internal ID of the Lucene document, ensuring that the results with the same score will always be returned in the same order.

0
source share

All Articles