Using Cassandra as an Event Store

I want to experiment using Cassandra as an event store in an event search application. My event storage requirements are pretty simple. The circuit event will be something like this:

  • id : aggregate root object id
  • data : serialized event data (e.g. JSON)
  • timestamp : when the event occurred
  • sequence_number : unique version of the event

I am completely new to Kassandra, so forgive me for my ignorance of what I am going to write. I have only two queries that I would ever like to use for this data.

  • Give me all the events for the given root root id
  • Give me all the events for this aggregate root if the sequence number is → x

My idea is to create a Cassandra table in CQL as follows:

CREATE TABLE events ( id uuid, seq_num int, data text, timestamp timestamp, PRIMARY KEY (id, seq_num) ); 

Does this seem like a reasonable way to model a problem? And what's important, does the composite primary key use it to efficiently execute the queries I asked? Remember that if used, there may be a large number of events (with a different seq_num) for the same root root identifier.

My particular concern is that the second query will be inefficient in some way (I think of secondary indexes here ...)

+7
cassandra
source share
5 answers

Your design seems to be well modeled in terms of "cassandra". The queries you need are indeed supported in tables with "composite keys", you have something like:

  • request 1: select * from events where id = 'id_event' ;
  • query 2: select * from events where id = 'id_event' and seq_num > NUMBER ;

I don’t think the second request will be inefficient, however it can return many elements ... if so, you can set a “limit” for the returned events. If possible, you can use the limit keyword.

Using composite keys seems like a good fit to your specific requirements. Using "secondary indexes" doesn't seem to make a big contribution to the table ... unless I miss something in your design / requirements.

NTN.

+5
source share

What you have is good, except when there are a lot of events for a particular aggregate. One thing you can do is create a static column to hold "next" and "max_sequence". The idea is that static columns will contain the current maximum sequence for this section and an “artificial identifier” for the next section. You could, say, store 100 or 1000 events per partition. What you essentially did was transform events for an aggregate into several sections. This will mean additional overhead for requests and storage, but at the same time it will protect against unlimited growth. You can even create a partition search for a collection. It really depends on your use case and how smart you want it to be.

+1
source share

I used Cassandra for a very similar scenario (with 100k + columns per row) and finished a model close to yours. I also agree with emgsilva that the secondary index probably will not bring much.

There are three things that were important for good performance for our event repository: using composite columns, making sure the columns are in a convenient sort order (Cassandra sorts the data row by column) and uses compact storage if possible.

Note that compact storage means that you can have only one column of values. Therefore, you need to make all other columns part of the key.

For you, the diagram will look like this:

 CREATE TABLE events ( id uuid, seq_num int, timestamp timestamp, data text, PRIMARY KEY (id, seq_num, timestamp)) WITH COMPACT STORAGE; 
+1
source share

The partition key is too complex, you must create a composite partition key or modify it to get the best performance for time series modeling. For example,

 CREATE TABLE events ( event_date int, id timeuuid, seq_num int, data text, PRIMARY KEY (event_date, id) ); 

Thus, your identifier will become a clustering column to guarantee the uniqueness of events, and your partition key (i.e. 20160922) can group all events per day. You can change it for a month. Avoid using uuid timeuuid instead, it already saves timestamp information.

0
source share

I do not use your design to save an aggregate file to an eventstore. You can save a domainevent for flexibility. I explain that eventdomain is the smallest data that makes an application state change. aggregateroot is not compatible with eventstore; it is intended for data exchange or limited context. when you use a domain event, you can recover your data even with aggregation using plolygot. You can control the model for your client’s needs and limitations. So you simulate the graphics for the connections between domainobject, and after that you use neo4j, in addition, you simulate the model population, and you use documentdatabase.I means that you have the opportunity to change the model and use the convenient persistance mechanism. This is the difference between the polygon data and the polygon. I understand two ways in your strategy: if you need you to use event modeling based on domainevent and cassandra. if you need data or an aggregateroot model and no events, you use a documented database and you can get two queries.

You can eliminate confusion regarding domain-driven designs.

-3
source share

All Articles