Kassandra - overlapping data ranges

I have the following Tasks table in Kassandra.

  • TaskUID UUID - partition key
  • Starts_On TIMESTAMP - Cluster column
  • Ends_On TIMESTAMP - Cluster Column

I want to run a CQL query to get overlapping tasks for a given date range. For example, if I pass two timestamps (T1 and T2) as parameters for the request, I want to get all the tasks that are applicable in this range (that is, overlapping records).

What is the best way to do this in Kassandra? I can't just use two ranges for Starts_On and Ends_On here, because to add a range query to Ends_On I have to have an equality check for Starts_On.

+4
source share
3 answers

Here's another idea (somewhat unconventional). You can create a user-defined function to implement a second-range filter (in Cassandra 2.2 and later).

Suppose you define your table like this (shown using ints instead of timestamps, for a simple example):

CREATE TABLE tasks (
    p int, 
    task_id timeuuid, 
    start int, 
    end int, 
    end_range int static, 
    PRIMARY KEY(p, start));

Now we create a user-defined function to check the returned rows based on the end time and return the task_id of the matching rows, for example:

CREATE FUNCTION my_end_range(task_id timeuuid, end int, end_range int) 
    CALLED ON NULL INPUT RETURNS timeuuid LANGUAGE java AS 
    'if (end <= end_range) return task_id; else return null;';

Now I use the trick with the third parameter. With explicit (large?) Supervision, it seems you cannot pass a constant to a specific function. Therefore, to get around this, we pass a static column (end_range) as our constant.

So, first we need to set the desired end_range:

UPDATE tasks SET end_range=15 where p=1;

, :

SELECT * FROM tasks;

 p | start | end_range | end | task_id
---+-------+-----------+-----+--------------------------------------
 1 |     1 |        15 |   5 | 2c6e9340-4a88-11e5-a180-433e07a8bafb
 1 |     2 |        15 |   7 | 3233a040-4a88-11e5-a180-433e07a8bafb
 1 |     4 |        15 |  22 | f98fd9b0-4a88-11e5-a180-433e07a8bafb
 1 |     8 |        15 |  15 | 37ec7840-4a88-11e5-a180-433e07a8bafb

task_id, start >= 2 end <= 15:

SELECT start, end, my_end_range(task_id, end, end_range) FROM tasks 
    WHERE p=1 AND start >= 2;

 start | end | test.my_end_range(task_id, end, end_range)
-------+-----+--------------------------------------------
     2 |   7 |       3233a040-4a88-11e5-a180-433e07a8bafb
     4 |  22 |                                       null
     8 |  15 |       37ec7840-4a88-11e5-a180-433e07a8bafb

, task_id, ( UDF). , start >= 2 , UDF.

, , , .:)

+2

CQL , , , . , start_on, , , end_on , .

0

, , , , . userID ( ), , . :

CREATE TABLE userEvents (
  userid UUID,
  eventTime TIMEUUID,
  eventType TEXT,
  eventDesc TEXT,
  PRIMARY KEY ((userid),eventTime,eventType));

userid eventtime:

SELECT userid,dateof(eventtime),eventtype,eventdesc FROM userevents 
  WHERE userid=dd95c5a7-e98d-4f79-88de-565fab8e9a68 
  AND eventtime >= mintimeuuid('2015-08-24 00:00:00-0500');

 userid                               | system.dateof(eventtime) | eventtype | eventdesc
--------------------------------------+--------------------------+-----------+-----------
 dd95c5a7-e98d-4f79-88de-565fab8e9a68 | 2015-08-24 08:22:53-0500 |       End |    event1
 dd95c5a7-e98d-4f79-88de-565fab8e9a68 | 2015-08-24 11:45:00-0500 |     Begin |     lunch
 dd95c5a7-e98d-4f79-88de-565fab8e9a68 | 2015-08-24 12:45:00-0500 |       End |     lunch

(3 rows)

.

:

  • , - ( ), eventType eventtime .
  • You will store each event twice (once to start and once to complete). Duplication of data usually does not cause much concern in Kassandra, but I would like to point it out directly.
  • In your case, you need to find a good key for separation, as it Task_IDwill be too unique (high power). This is necessary in Cassandra, since you cannot ask for a partition key (only a clustering key).
0
source

All Articles