GET Consistency (and quorum) in ElasticSearch

I am new to ElasticSearch and I am evaluating it for a project.

In ES, replication can be synchronized or asynchronous. In the case of async, the client returns to success as soon as the document is written to the main shard. And then the document will be transferred to other replicas asynchronously.

When writing asynchronously, we guarantee that when the GET is executed, the data is returned, even if it does not apply to all replicas. Because when we perform a GET in ES, the request is forwarded to one of the replicas of the corresponding fragment. If we write asynchronously, the primary fragment may have a document, but the selected replica for GET may not yet receive / write the document. In Cassandra, we can specify levels of consistency (ONE, QUORUM, ALL) during recording, as well as read. Is something like this possible to read in ES?

+4
source share
2 answers

That's right, you can set replication as asynchronous (by default, synchronization) so that you don't have to wait for replicas, although in practice this does not buy you much.

Whenever you read data, you can specify the preference parameter to control where the documents come from. If you use preference:_primary , you will make sure that you always take the document from the main fragment, otherwise, if the receipt is performed before the document is available on all replicas, it may happen that you get into the fragment that does not have it yet. Given that get api works in real time, it usually makes sense to maintain replication synchronization, so after returning the index operation you can always return the document by id from any shard that should contain it. However, if you try to return a document when indexing it for the first time, it may happen that you do not find it.

Elasticsearch also has a record matching option, but it is different from the way other data warehouses work, and it is not related to whether replication is synchronization or asynchronous. Using the consistency parameter, you can control how many copies of data should be available so that the write operation is valid. If there are not enough copies of the data, the write operation will fail (after waiting up to 1 minute, an interval that you can change using the timeout parameter). This is just a preliminary check to decide whether to accept the operation or not. This does not mean that if the operation is not performed on the replica, it will be rejected. In fact, if the write operation is not performed on the replica, but succeeds on the primary, it is assumed that something is wrong with the replica (or with a hard continuation), so the fragment will be marked as unsuccessful and recreated on another node. The default value for negotiation is quorum , and can also be set to one or all .

However, when it comes to get api, elasticsearch is not ultimately sequential, but just sequential, once the index is indexed, you can get it.

The fact that recently added documents are not available for search until the next update operation, which happens every second automatically by default, is not really related to possible consistency (since there are documents and can be obtained by id), but more about how they work searches and lucens, and how documents become visible through lucens.

+7
source

Here is the answer I gave on the mailing list:

As far as I understand the big picture, when you index a document that it writes to the transaction log, and then you get a successful response from ES. After that, asynchronously, it is replicated to other nodes and indexed by Lucene.

However, you cannot search immediately for a document, but you can get it. ES will read tlog if necessary when you receive the document.

I think (not sure) that if the replica is not updated, the GET will be sent to the primary tlog.

Correct me if I am wrong.

+3
source

All Articles