Using DBMS as a storage of event sources

If I used RDBMS (e.g. SQL Server) to store data source data, what might the schema look like?

I saw several variations that were discussed in an abstract sense, but nothing concrete.

For example, suppose you have a Product object, and changes to this product can be represented as: Price, Cost, and Description. I am confused by the fact that:

  • You have a “ProductEvent” table in which there are all fields for the product, where each change means a new entry in this table, as well as “who, what, where, why, when and how” as needed. When a value, price or description changes, an entire new line is added to represent the Product.
  • Store product value, price, and description in separate tables connected to the product table with a foreign key relationship. When changes to these properties occur, write new lines with WWWWWH, if necessary.
  • Store WWWWWH, as well as the serialized object that represents the event, in the "ProductEvent" table, that is, the event itself must be loaded, de-serialized and replayed in my application code to rebuild the application state for this product.

In particular, I am worried about option 2 above. Brought to the extreme, the product table would be almost one table for each property, where to load the application state for a given product would require loading all the events for this product from each product event table. This explosion table smells bad to me.

I am sure that "it depends", and while there is no single "right answer", I am trying to understand what is acceptable and what is completely unacceptable. I also know that NoSQL can help here, where events can be stored against the aggregate root, which means only one database query to get events to rebuild the object, but we are not using NoSQL database at the moment, so I feel around alternatives.

+104
cqrs event-sourcing
Aug 15 '11 at 12:44
source share
5 answers

The event store should not be aware of specific fields or properties of events. Otherwise, each modification of your model will result in the need to migrate your database (as in the case of an old-fashioned state based state). Therefore, I would not recommend options 1 and 2 at all.

Below is the diagram used in Ncqrs . As you can see, the Events table stores related data in the form of a CLOB (that is, JSON or XML). This matches your option 3 (just that there is no “ProductEvents” table because you only need one common “Events” table. In Ncqrs, matching with your combined roots happens through the “EventSources” table, where each EventSource corresponds to the actual Aggregate root.)

Table Events: Id [uniqueidentifier] NOT NULL, TimeStamp [datetime] NOT NULL, Name [varchar](max) NOT NULL, Version [varchar](max) NOT NULL, EventSourceId [uniqueidentifier] NOT NULL, Sequence [bigint], Data [nvarchar](max) NOT NULL Table EventSources: Id [uniqueidentifier] NOT NULL, Type [nvarchar](255) NOT NULL, Version [int] NOT NULL 

SQL Saving Mechanism The implementation of Jonathan Oliver's event repository consists essentially of a single table called "Commits" with the Payload BLOB field. This is almost the same as in Ncqrs, only that it serializes the properties of the event in binary format (which, for example, adds encryption support).

Greg Young recommends a similar approach, as it is widely documented on Greg's website .

The layout of his prototype Events table reads:

 Table Events AggregateId [Guid], Data [Blob], SequenceNumber [Long], Version [Int] 
+99
Aug 15 2018-11-11T00:
source share

The GitHub CQRS.NET project has some concrete examples of how you can create EventStores in several different technologies. At the time of this writing, SQL implements an implementation using Linq2SQL and an SQL schema , one for MongoDB , one for DocumentDB (CosmosDB if you're on Azure), and one using EventStore (as mentioned above). There is something else in Azure, like Table Storage and Blob storage, which is very similar to flat file storage.

I suppose the bottom line is that they all comply with one principle / contract. They all store information in one place / container / table, they use metadata to identify one event from another, and "just" keep the entire event as it was - in some cases serialized, as well as supporting technologies. Thus, depending on whether you choose a document database, a relational database, or even a simple file, there are several different ways to achieve the same intention of event storage (this is useful if you change your mind at any time and find that you need to migrate or support more than one storage technology).

As a project developer, I can share some views on some of the decisions we made.

Firstly, we found (even with unique UUIDs / GUIDs instead of integers) for many reasons, consecutive identifiers occur for strategic reasons, so just having an identifier was not unique enough for the key, so we combined our key column with the main identifier data / Type an object to create what should really (in the sense of your application) be a unique key. I know that some people say that you do not need to store it, but it will depend on whether you are in a new place or you need to coexist with existing systems.

We settled on one container / table / collection for ease of maintenance, but we played with a separate table for each entity / object. In practice, we found that this means that either the application needs "CREATE" permissions (which, generally speaking, is not a good idea ... as a rule, there are always exceptions / exceptions), or every time a new object / object appears or deployed, new storage containers / tables / collections should be made. We found that it was very slow for local development and problematic for production deployment. You cannot, but it was our real experience.

Another thing to keep in mind is that the X action requirement can lead to many different events, thus knowing all the events generated by the command / event / that are ever useful. They can also relate to different types of objects, for example, clicking the buy button in the basket can trigger account events and warehousing. A consumer application may want to know all this, so we added CorrelationId. This meant that the consumer could request all the events that arose as a result of their request. You will see this in the diagram .

In particular, using SQL, we found that performance really becomes a bottleneck if indexes and partitions are not used properly. Remember that events must be transmitted in the reverse order if you use snapshots. We tried several different indexes and found that in practice, some additional indexes are needed to debug real-world real-world applications. Again you will see this in the diagram .

Other production metadata was useful during production investigations; timestamps allowed us to understand the order in which events were stored rather than generated. This gave us some help in creating a system in which there were especially many events that caused a huge number of events, providing us with information about the performance of such things as networks and the distribution of systems over a network.

+7
Aug 6 '17 at 23:53 on
source share

Well, you can take a look at Datomic.

Datomic is a database of flexible, temporary data that supports queries and joins, with flexible scalability and ACID transactions.

I wrote a detailed answer here

You can watch a conversation from Stuart Hallow explaining the design of Datomic here.

Since Datomic stores facts in time, you can use it for cases of using event sources, etc.

+3
Jul 20 '13 at 8:37
source share

A possible hint is a design followed by “Slow Resizing” (type = 2) to help you:

  • order of events (via surrogate key)
  • the longevity of each condition (valid from - valid to)

The left reset function should also be implemented, but you need to think about the complexity of queries in the future.

+1
Mar 13 '15 at 0:06
source share

I think that the solution (1 and 2) can become a problem very quickly as your domain model develops. New fields are created, some change their meaning, and some may stop being used. Ultimately, your table will have dozens of empty fields, and event loading will be messy.

Also, remember that the event repository should only be used for records, you request it only for loading events, and not for aggregate properties. These are separate things (this is the essence of CQRS).

Decision 3 What people usually do, there are many ways to do this.

For example, EventFlow CQRS when used with SQL Server creates a table with the following schema:

 CREATE TABLE [dbo].[EventFlow]( [GlobalSequenceNumber] [bigint] IDENTITY(1,1) NOT NULL, [BatchId] [uniqueidentifier] NOT NULL, [AggregateId] [nvarchar](255) NOT NULL, [AggregateName] [nvarchar](255) NOT NULL, [Data] [nvarchar](max) NOT NULL, [Metadata] [nvarchar](max) NOT NULL, [AggregateSequenceNumber] [int] NOT NULL, CONSTRAINT [PK_EventFlow] PRIMARY KEY CLUSTERED ( [GlobalSequenceNumber] ASC ) 

Where:

  • GlobalSequenceNumber : A simple global identification that can be used to organize or identify missing events when creating your projection (readmodel).
  • BatchId : identifying a group of events that were inserted atomically (TBH, I have no idea why this would be useful)
  • AggregateId : aggregate identification
  • Data : Serialized Event
  • Metadata : other useful information about the event (for example, the type of event used to deserialize, timestamp, sender ID from the command, etc.)
  • AggregateSequenceNumber : serial number in the same aggregate (this is useful if you cannot have records occurring out of order, so you use this field for optimistic concurrency)

However, if you are creating from scratch, I would recommend following the YAGNI principle and creating with the minimum required fields for your use case.

0
Jun 28 '19 at 15:00
source share



All Articles