Which leads to different search results for the same search query for elasticity on two nodes

Question

Which leads to different search results for the same search query for elasticity on two nodes

I have two node search settings, when the same search query on one node produces different results than the other, and I would like to know why this is so. Details:

The same documents (equal content and identifier) have different ratings on two nodes, which leads to a different sorting order.
This is reproducible: I can remove the entire index and rebuild it from the database, but still the results are different.
Two es nodes are being implemented in java ee war. With each deployment, the index is restored from the database.
Initially, when a problem was discovered, the hits.total results for the same query, where they are different on two nodes. They match after deleting and restoring the index.
My workaround at the moment is to use preferences = _local as suggested here .
Until now, I could not find interesting errors in the magazines.

_cluster / state:

{ "cluster_name": "elasticsearch.abc", "version": 330, "master_node": "HexGKOoHSxqRaMmwduCVIA", "blocks": {}, "nodes": { "rUZDrUfMR1-RWcy4t0YQNw": { "name": "Owl", "transport_address": "inet[/10.123.123.123:9303]", "attributes": {} }, "HexGKOoHSxqRaMmwduCVIA": { "name": "Bloodlust II", "transport_address": "inet[/10.123.123.124:9303]", "attributes": {} } }, "metadata": { "templates": {}, "indices": { "abc": { "state": "open", "settings": { "index": { "creation_date": "1432297566361", "uuid": "LKx6Ro9CRXq6JZ9a29jWeA", "analysis": { "filter": { "substring": { "type": "nGram", "min_gram": "1", "max_gram": "50" } }, "analyzer": { "str_index_analyzer": { "filter": [ "lowercase", "substring" ], "tokenizer": "keyword" }, "str_search_analyzer": { "filter": [ "lowercase" ], "tokenizer": "keyword" } } }, "number_of_replicas": "1", "number_of_shards": "5", "version": { "created": "1050099" } } }, "mappings": { "some_mapping": { ... } ... }, "aliases": [] } } }, "routing_table": { "indices": { "abc": { "shards": { "0": [ { "state": "STARTED", "primary": true, "node": "HexGKOoHSxqRaMmwduCVIA", "relocating_node": null, "shard": 0, "index": "abc" }, { "state": "STARTED", "primary": false, "node": "rUZDrUfMR1-RWcy4t0YQNw", "relocating_node": null, "shard": 0, "index": "abc" } ], "1": [ { "state": "STARTED", "primary": false, "node": "HexGKOoHSxqRaMmwduCVIA", "relocating_node": null, "shard": 1, "index": "abc" }, { "state": "STARTED", "primary": true, "node": "rUZDrUfMR1-RWcy4t0YQNw", "relocating_node": null, "shard": 1, "index": "abc" } ], "2": [ { "state": "STARTED", "primary": true, "node": "HexGKOoHSxqRaMmwduCVIA", "relocating_node": null, "shard": 2, "index": "abc" }, { "state": "STARTED", "primary": false, "node": "rUZDrUfMR1-RWcy4t0YQNw", "relocating_node": null, "shard": 2, "index": "abc" } ], "3": [ { "state": "STARTED", "primary": false, "node": "HexGKOoHSxqRaMmwduCVIA", "relocating_node": null, "shard": 3, "index": "abc" }, { "state": "STARTED", "primary": true, "node": "rUZDrUfMR1-RWcy4t0YQNw", "relocating_node": null, "shard": 3, "index": "abc" } ], "4": [ { "state": "STARTED", "primary": true, "node": "HexGKOoHSxqRaMmwduCVIA", "relocating_node": null, "shard": 4, "index": "abc" }, { "state": "STARTED", "primary": false, "node": "rUZDrUfMR1-RWcy4t0YQNw", "relocating_node": null, "shard": 4, "index": "abc" } ] } } } }, "routing_nodes": { "unassigned": [], "nodes": { "HexGKOoHSxqRaMmwduCVIA": [ { "state": "STARTED", "primary": true, "node": "HexGKOoHSxqRaMmwduCVIA", "relocating_node": null, "shard": 4, "index": "abc" }, { "state": "STARTED", "primary": true, "node": "HexGKOoHSxqRaMmwduCVIA", "relocating_node": null, "shard": 0, "index": "abc" }, { "state": "STARTED", "primary": false, "node": "HexGKOoHSxqRaMmwduCVIA", "relocating_node": null, "shard": 3, "index": "abc" }, { "state": "STARTED", "primary": false, "node": "HexGKOoHSxqRaMmwduCVIA", "relocating_node": null, "shard": 1, "index": "abc" }, { "state": "STARTED", "primary": true, "node": "HexGKOoHSxqRaMmwduCVIA", "relocating_node": null, "shard": 2, "index": "abc" } ], "rUZDrUfMR1-RWcy4t0YQNw": [ { "state": "STARTED", "primary": false, "node": "rUZDrUfMR1-RWcy4t0YQNw", "relocating_node": null, "shard": 4, "index": "abc" }, { "state": "STARTED", "primary": false, "node": "rUZDrUfMR1-RWcy4t0YQNw", "relocating_node": null, "shard": 0, "index": "abc" }, { "state": "STARTED", "primary": true, "node": "rUZDrUfMR1-RWcy4t0YQNw", "relocating_node": null, "shard": 3, "index": "abc" }, { "state": "STARTED", "primary": true, "node": "rUZDrUfMR1-RWcy4t0YQNw", "relocating_node": null, "shard": 1, "index": "abc" }, { "state": "STARTED", "primary": false, "node": "rUZDrUfMR1-RWcy4t0YQNw", "relocating_node": null, "shard": 2, "index": "abc" } ] } }, "allocations": [] }

_cluster / health

 { "cluster_name": "elasticsearch.abc", "status": "green", "timed_out": false, "number_of_nodes": 2, "number_of_data_nodes": 2, "active_primary_shards": 5, "active_shards": 10, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 0, "number_of_pending_tasks": 0 }

_cluster / statistics

 { "timestamp": 1432312770877, "cluster_name": "elasticsearch.abc", "status": "green", "indices": { "count": 1, "shards": { "total": 10, "primaries": 5, "replication": 1, "index": { "shards": { "min": 10, "max": 10, "avg": 10 }, "primaries": { "min": 5, "max": 5, "avg": 5 }, "replication": { "min": 1, "max": 1, "avg": 1 } } }, "docs": { "count": 19965, "deleted": 4 }, "store": { "size_in_bytes": 399318082, "throttle_time_in_millis": 0 }, "fielddata": { "memory_size_in_bytes": 60772, "evictions": 0 }, "filter_cache": { "memory_size_in_bytes": 15284, "evictions": 0 }, "id_cache": { "memory_size_in_bytes": 0 }, "completion": { "size_in_bytes": 0 }, "segments": { "count": 68, "memory_in_bytes": 10079288, "index_writer_memory_in_bytes": 0, "index_writer_max_memory_in_bytes": 5120000, "version_map_memory_in_bytes": 0, "fixed_bit_set_memory_in_bytes": 0 }, "percolate": { "total": 0, "time_in_millis": 0, "current": 0, "memory_size_in_bytes": -1, "memory_size": "-1b", "queries": 0 } }, "nodes": { "count": { "total": 2, "master_only": 0, "data_only": 0, "master_data": 2, "client": 0 }, "versions": [ "1.5.0" ], "os": { "available_processors": 8, "mem": { "total_in_bytes": 0 }, "cpu": [] }, "process": { "cpu": { "percent": 0 }, "open_file_descriptors": { "min": 649, "max": 654, "avg": 651 } }, "jvm": { "max_uptime_in_millis": 2718272183, "versions": [ { "version": "1.7.0_40", "vm_name": "Java HotSpot(TM) 64-Bit Server VM", "vm_version": "24.0-b56", "vm_vendor": "Oracle Corporation", "count": 2 } ], "mem": { "heap_used_in_bytes": 2665186528, "heap_max_in_bytes": 4060086272 }, "threads": 670 }, "fs": { "total_in_bytes": 631353901056, "free_in_bytes": 209591468032, "available_in_bytes": 209591468032 }, "plugins": [] } }

Request example:

 /_search?from=22&size=1 { "query": { "bool": { "should": [{ "match": { "address.city": { "query": "Bremen", "boost": 2 } } }], "must": [{ "match": { "type": "L" } }] } } }

Response to the first request

 { "took": 30, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 19543, "max_score": 6.407021, "hits": [{ "_index": "abc", "_type": "xyz", "_id": "ABC123", "_score": 5.8341036, "_source": { ... } }] } }

Answer for the second request

 { "took": 27, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 19543, "max_score": 6.407021, "hits": [ { "_index": "abc", "_type": "xyz", "_id": "FGH12343", "_score": 5.8341036, "_source": { ... } } ] }

}

What could be the reason for this and how can I provide the same results for different nodes?

Explained query request: search / abc / mytype / _search? from = 0 & size = 1 & search_type = dfs_query_then_fetch & explain =

 { "query": { "bool": { "should": [{ "match": { "address.city": { "query": "Karlsruhe", "boost": 2 } } }] } } }

Response to the first request

 { "took": 5, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 41, "max_score": 7.211497, "hits": [ { "_shard": 0, "_node": "rUZDrUfMR1-RWcy4t0YQNw", "_index": "abc", "_type": "mytype", "_id": "abc123", "_score": 7.211497, "_source": {... }, "_explanation": { "value": 7.211497, "description": "weight(address.city:karlsruhe^2.0 in 1598) [PerFieldSimilarity], result of:", "details": [ { "value": 7.211497, "description": "fieldWeight in 1598, product of:", "details": [ { "value": 1, "description": "tf(freq=1.0), with freq of:", "details": [ { "value": 1, "description": "termFreq=1.0" } ] }, { "value": 7.211497, "description": "idf(docFreq=46, maxDocs=23427)" }, { "value": 1, "description": "fieldNorm(doc=1598)" } ] } ] } } ] } }

Answer for the second request

 { "took": 6, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 41, "max_score": 7.194322, "hits": [ { "_shard": 0, "_node": "rUZDrUfMR1-RWcy4t0YQNw", "_index": "abc", "_type": "mytype", "_id": "abc123", "_score": 7.194322, "_source": {... }, "_explanation": { "value": 7.194322, "description": "weight(address.city:karlsruhe^2.0 in 1598) [PerFieldSimilarity], result of:", "details": [ { "value": 7.194322, "description": "fieldWeight in 1598, product of:", "details": [ { "value": 1, "description": "tf(freq=1.0), with freq of:", "details": [ { "value": 1, "description": "termFreq=1.0" } ] }, { "value": 7.194322, "description": "idf(docFreq=48, maxDocs=24008)" }, { "value": 1, "description": "fieldNorm(doc=1598)" } ] } ] } } ] } }

+3

java-ee elasticsearch

s.Daniel May 22, '15 at 16:50

source share

1 answer

Andrei Stefan · Accepted Answer · 2015-05-22T16:59:30+0000

Impression mismatch is most likely due to unsynchronization between the primary fragments and the replica. This can happen if you have a node, leaving the cluster (for some reason), but continuing to make changes to the documents (indexing, deleting, updating).

The scoring part is a completely different story, and it can be explained using the “Relevance” section of this blog post :

When performing a search, searching for elastics faces an interesting dilemma. In your request, you need to find all the relevant documents ... but these documents are scattered throughout the number of skulls in your cluster. Each shard is basically a Lucene index, which stores its own statistics on TF and DF. A shard knows how many times the “pineapple” appears inside the shard, and not the entire cluster.

I would try searching for "DFS Query Then Fetch", which means _search? search_type = dfs_query_then_fetch.... _search? search_type = dfs_query_then_fetch.... which should help with the accuracy of the estimate.

In addition, a different number of documents caused by document changes during a node shutdown affects the calculation of the score after the index has been deleted and rebuilt. This may be due to the fact that changes in documents occurred differently on the replica and on the primary fragments, more specifically, the documents were deleted. A deleted document is permanently deleted from the index when segments are merged. Segment merging does not occur if certain conditions are not met in the base Lucene instance.

can force merge be initiated by POST before /_ optimize? max_num_segments = 1 /_ optimize? max_num_segments = 1 . A warning. This takes a very long time (depending on the size of the index) and will require significant input / output and processor resources and should not be started on the index in which the changes occur. Documentation: Optimization , Merging Segments

Which leads to different search results for the same search query for elasticity on two nodes

More articles: