Cassandra timeout while reading a request with ONE consistency (1 response required, but only 0 replica)

Question

Cassandra timeout while reading a request with ONE consistency (1 response required, but only 0 replica)

I make read and update requests in a table with 500,000 rows and several times get below the error after processing about 300,000 rows, even if no node is working.

Cassandra timeout while reading a request with ONE consistency (1 response required, but only 0 replica)

Infrastructure Details:
With 5 Cassandra nodes, 5 spark and 3 Hadoop nodes, each of which has 8 cores and 28 GB of memory, and a Cassandra replication coefficient 3 .

Kassandra 2.1.8.621 | DSE 4.7.1 | Spark 1.2.1 | Hadoop 2.7.1.

Cassandra Configuration:

read_request_timeout_in_ms (ms): 10000 range_request_timeout_in_ms (ms): 10000 write_request_timeout_in_ms (ms): 5000 cas_contention_timeout_in_ms (ms): 1000 truncate_request_timeout_in_ms (ms): 60000 request_timeout_in_ms (ms): 10000.

I tried the same job, increasing read_request_timeout_in_ms (ms) to 20,000, but that didn't help.

I am making queries on two tables. The following is the create statement for one of the tables:

Create table:

 CREATE TABLE section_ks.testproblem_section ( problem_uuid text PRIMARY KEY, documentation_date timestamp, mapped_code_system text, mapped_problem_code text, mapped_problem_text text, mapped_problem_type_code text, mapped_problem_type_text text, negation_ind text, patient_id text, practice_uid text, problem_category text, problem_code text, problem_comment text, problem_health_status_code text, problem_health_status_text text, problem_onset_date timestamp, problem_resolution_date timestamp, problem_status_code text, problem_status_text text, problem_text text, problem_type_code text, problem_type_text text, target_site_code text, target_site_text text ) WITH bloom_filter_fp_chance = 0.01 AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE';

Requests:

1) SELECT encounter_uuid, encounter_start_date FROM section_ks.encounters WHERE patient_id = '1234' AND encounter_start_date >= '" + formatted_documentation_date + "' ALLOW FILTERING;

2) UPDATE section_ks.encounters SET testproblem_uuid_set = testproblem_uuid_set + {'1256'} WHERE encounter_uuid = 'abcd345';

+7

cassandra hadoop apache-spark datastax-java-driver datastax

Abhinandan satpute Sep 01 '15 at 9:07

source share

2 answers

Do not think that configuration is the main reason, but there is a problem with the data model.

It would be great to see the structure of the section_ks.encounters table.

We recommend that you carefully consider what specific queries are expected to be performed before the design table.

As far as I can see, these two requests expect a different section_ks.encounters structure in order to run them with good performance.

Review each submitted query and try to create tables:

First:

SELECT collides_wid, collides_start_date FROM section_ks.encounters WHERE patient_id = '1234' AND collides_start_date> = '"+ formatted_documentation_date +"' ALLOW FILTERING;

The first point, if Cassandra forces you to add ALLOW FILTERING, which is a symbol of a non-optimal query or table structure.
Second point. Primary key. Great explanation of what are the primary keys in Cassandra . This query will work quickly and without the mandatory ALLOW FILTERING statement if the patient_id column and the start_date column column form a composite primary key. Enumerating the columns inside the PRIMARY KEY () statement must match the filtering order in your query.
Why is ALLOW FILTERING required in the original query? By the keys of the section, Cassandra knows what data the node is on. If the patient_id column is not a section key, Cassandra had to scan all 5 nodes to find the required patient. When we have a lot of data by nodes, such a full scan usually ends in a timeout.

Here is an example table structure that is effective for this query:

 create table section_ks.encounters( patient_id bigint, encounter_start_date timestamp, encounter_uuid text, some_other_non_unique_column text, PRIMARY KEY (patient_id, encounter_start_date) );

column_id will be the "partition key". Responsible for distributing data to Cassandra nodes. In simple words (excluding the replication function): different patient ranges will be stored on different nodes. Column
start_date collision will be a "clustered key". Responsible for sorting data within a section.

ALLOW FILTERING can now be removed from the query:

 SELECT encounter_uuid, encounter_start_date FROM section_ks.encounters WHERE patient_id = '1234' AND encounter_start_date >= '2017-08-19';

Second request:

UPDATE section_ks.encounters SET testproblem_uuid_set = testproblem_uuid_set + {'1256'} WHERE collides_uuid = 'abcd345';

The table structure should look something like this:

 create table section_ks.encounters( encounter_uuid text, -- partition key patient_id bigint, testproblem_uuid_set text, some_other_non_unique_column text, PRIMARY KEY (encounter_uuid) );

If we finally want to do quick filtering only with the collision command, this should be defined as a section key.

Good articles on developing an effective data model:

-one

Yurii bratchuk Aug 19 '17 at 17:05

source share

Jim meyer · Accepted Answer · 2015-09-01T20:47:36+0000

Usually, when you get a timeout error, it means that you are trying to do something that does not scale well in Kassandra. The fix often consists in changing your schema.

I suggest you keep an eye on the nodes during query execution to find out if you can detect a problem area. For example, you can run "watch -n 1 nodetool tpstats" to see if any queues are reserved or items are dropped. See Other Monitoring Recommendations here .

One thing that can be disabled in your configuration is that you say that you have five Cassandra nodes, but only 3 spark workers (or do you say that you have three Cassandra nodes on each Cassandra node?) . You will want at least one spark generator on each Cassandra node, so loading data into the spark is performed locally on each node, and not over the network.

It is hard to say much more than without your schema and the query you are using. Do you read from one section? I began to get timeout errors of about 300,000 lines when reading from one section. See the question here . The only workaround I have found so far is to use a client-side hash in my section key to break partitions into smaller pieces of about 100 thousand lines. So far, I have not found a way to tell Cassandra that there will be no timeout for the request, which I expect will take a lot of time.

Cassandra timeout while reading a request with ONE consistency (1 response required, but only 0 replica)

More articles: