Order the latest entries for the tag in Kassandra

I am trying to display the latest values ​​from a list of sensors. The list should also be sorted by timestamp.

I tried two different approaches. I turned on the sensor update time in the primary key:

CREATE TABLE sensors ( customerid int, sensorid int, changedate timestamp, value text, PRIMARY KEY (customerid, changedate) ) WITH CLUSTERING ORDER BY (changedate DESC); 

Then I can select the list as follows:

 select * from sensors where customerid=0 order by changedate desc; 

which leads to the following:

  customerid | changedate | sensorid | value ------------+--------------------------+----------+------- 0 | 2015-07-10 12:46:53+0000 | 1 | 2 0 | 2015-07-10 12:46:52+0000 | 1 | 1 0 | 2015-07-10 12:46:52+0000 | 0 | 2 0 | 2015-07-10 12:46:26+0000 | 0 | 1 

The problem is that I am not getting only the latest results, but all the old values ​​too.

If I remove the changes from the primary key, the selection will fail.

 InvalidRequest: code=2200 [Invalid query] message="Order by is currently only supported on the clustered columns of the PRIMARY KEY, got changedate" 

Sensor values ​​are also not updated:

 update overview set changedate=unixTimestampOf(now()), value = '5' where customerid=0 and sensorid=0; InvalidRequest: code=2200 [Invalid query] message="PRIMARY KEY part changedate found in SET part" 

This fails because the change is part of the primary key.

Is there a way to save only the most recent values ​​from each sensor, and also keep a table ordered by timestamp?

Edit: In the meantime, I tried a different approach to keep only the most recent value.

I used this circuit:

 CREATE TABLE sensors ( customerid int, sensorid int, changedate timestamp, value text, PRIMARY KEY (customerid, sensorid, changedate) ) WITH CLUSTERING ORDER BY (changedate DESC); 

Before pasting the last value, I will delete all the old values

 DELETE FROM sensors WHERE customerid=? and sensorid=?; 

But this fails because changedate NOT part of the WHERE clause.

+5
source share
2 answers

The problem is that I am not getting only the latest results, but all the old values ​​too.

Since you save to CLUSTERING ORDER DESC, it will always be very easy to get the latest records, all you have to do is add “LIMIT” to your query, that is:

 select * from sensors where customerid=0 order by changedate desc limit 10; 

Will return you a maximum of 10 records with the highest change. Despite the fact that you use the limit, you still guarantee the receipt of the latest records, as your data is organized in this way.

If I remove the changes from the primary key, the selection will fail.

This is because you cannot order in a column that is not a clustering key (secondary part of the primary key), with the possible exception of a secondary index, which I would not recommend.

Sensor values ​​are also not updated.

Your upgrade request does not work because you are not allowed to include part of the primary key in 'set'. To make this work, all you have to do is update your query to include where changeate in the sentence, i.e.:

 update overview set value = '5' and sensorid = 0 where customerid=0 and changedate=unixTimestampOf(now()) 

Is there a way to save only the most recent values ​​from each sensor, and also keep a table ordered by timestamp?

You can do this by creating a separate table named "latest_sensor_data" with the same table definition, except for the primary key. The primary key will now be "customerid, sensorid", so you can only have 1 entry per sensor. The process of creating individual tables is called denormalization and is a common usage pattern, especially in Cassandra data modeling. When you insert sensor data, you now insert data in both "sensors" and "last_knowledge."

 CREATE TABLE latest_sensor_data ( customerid int, sensorid int, changedate timestamp, value text, PRIMARY KEY (customerid, sensorid) ); 

In cassandra 3.0 there are “materialized views” that will make this unnecessary as you can use materialized views to do this for you.

Now run the following query:

 select * from latest_sensor_data where customerid=0 

Gives you the last value for each sensor for this client.

I would recommend renaming "sensors" to "sensor_data" or "sensor_history" to make it clearer what data is. In addition, you must change the primary key to "customerid, changeate, sensorid", as this will allow you to have multiple sensors on the same date (which seems possible).

+2
source

Your first approach seems reasonable. If you add "limit 1" to your query, you will only get the last result or limit 2 to see the last 2 results, etc.

If you want to automatically delete old values ​​from the table, you can specify TTL (Time To Live) for data points when you insert. Therefore, if you want to save the data for 10 days, you can do this by adding "USE TTL 864000" in your insert statements. Or you can set the default TTL for the whole table.

+2
source

All Articles