Data synchronization through several randomly connected clients using EventSourcing (NodeJS, MongoDB, JSON)

Question

Data synchronization through several randomly connected clients using EventSourcing (NodeJS, MongoDB, JSON)

I had a problem with the implementation of data synchronization between the server and several clients. I read about Event Sourcing, and I would like to use it to complete part of the synchronization.

I know that this is not a technical issue, but a more conceptual one .

I would just send all the events live to the server, but the clients are meant to be used offline from time to time.

This is the basic concept:

The server stores all the events that every client should know about, it does not reproduce these events for data service, since the main goal is to synchronize events between clients, allowing them to play all events locally.

Customers have their own JSON store, also saving all events and rebuilding all the different collections from saved / synchronized events.

Because clients can change data offline, it’s not so important to have sequential synchronization cycles. With this in mind, the server should handle conflicts when merging different events and request a specific user in case of a conflict.

Thus, the main problem for me is to eliminate the differences between the client and the server, so as not to send all events to the server. I am also having problems with the sync order: click changes first, change the changes first?

What I just created is the default implementation of MongoDB on the server, which isolates all the documents of a certain user group in all my requests (currently only authentication and server-side databases work). On the client, I created a wrapper around the NeDB repository, allowing me to intercept all request operations to create and manage events for each request, keeping the default behavior of the request intact. I also compensated for the various neDB and MongoDB ID systems by implementing custom identifiers that are generated by clients and are part of the document data, so re-creating the database will not spoil the identifiers (during synchronization, these identifiers should be consistent for all clients).

The event format will look something like this:

{ type: 'create/update/remove', collection: 'CollectionIdentifier', target: ?ID, //The global custom ID of the document updated data: {}, //The inserted/updated data timestamp: '', creator: //Some way to identify the author of the change }

To save some memory on clients, I will create snapshots in certain amounts of events to ensure a complete redraw of all events.

So, to narrow down the problem : I can play back events on the client side, I can also create and maintain events on the client and on the server, combining events on the server should not be a problem either. Also, replicating the entire database with existing tools is not an option, since I only synchronize some parts of the database (even entire collections, since documents are assigned different groups in which they should be synchronized).

But I am having problems with :

The process of determining which events to send to the client during synchronization (avoid sending repeated events or even all events)
Determining which events will be sent back to the client (avoid sending recurring events or even all events)
The correct order of event synchronization (push / pull changes)

Another question I would like to ask is to store updates more efficiently directly in documents in a style similar to revision?

If my question is unclear, repeat (I found a few questions, but they did not help me in my script), or something is missing, please leave a comment, I will support it as best as possible so that it is easy, as I only that I wrote everything that could help you understand the concept.

Thanks in advance!

+7

javascript synchronization node.js mongodb event-sourcing

JoschuaSchneider Feb 28 '17 at 10:37

source share

2 answers

I think the best solution to avoid all problems with ordering and duplicating events is to use the pull method. Thus, each client maintains the last imported event state (for example, with a tracker) and requests from the server the events generated after the last.

An interesting problem will be the detection of violations of business invariants. To do this, you can also store the application command log on the client and in case of a conflict (events were generated by other clients), you could repeat the execution of commands from the command log. This must be done because some commands will not succeed after repeated execution; for example, a client saves a document after another user deletes the document at the same time.

+3

Constantin galbenu Feb 28 '17 at 16:59

source share

Frederic charette · Accepted Answer · 2017-02-28T17:32:06+0000

This is a very difficult question, but I will try to answer some form.

My first reflex, having seen your diagram, should think about how distributed databases replicate data among themselves and recover in case one node crashes. This is most often done using gossiping .

Gossip meetings ensure that data is synchronized. Timestamps are stored at both ends, merged on demand, for example, when a node reconnects or simply at a specified interval (publishing mass updates via a socket, etc.).

Database engines, such as Cassandra or Scylla, use 3 messages per round of pooling.

Demonstration:

Data in node A

 { id: 1, timestamp: 10, data: { foo: '84' } } { id: 2, timestamp: 12, data: { foo: '23' } } { id: 3, timestamp: 12, data: { foo: '22' } }

Data in node B

 { id: 1, timestamp: 11, data: { foo: '50' } } { id: 2, timestamp: 11, data: { foo: '31' } } { id: 3, timestamp: 8, data: { foo: '32' } }

Step 1: SYN

It lists the identifiers and the last pop-up timestamps of all documents (feel free to change the structure of these data packets, here I use detailed JSON to better illustrate the process)

Node A -> Node B

 [ { id: 1, timestamp: 10 }, { id: 2, timestamp: 12 }, { id: 3, timestamp: 12 } ]

Step 2: ACK

After receiving this package, node B compares the received timestamps with it. For each document, if it's a timestamp older, just put it in the ACK payload if it will contain new data along with the data. And if the timestamps are the same, do nothing - obviously.

Node B -> Node A

 [ { id: 1, timestamp: 11, data: { foo: '50' } }, { id: 2, timestamp: 11 }, { id: 3, timestamp: 8 } ]

Step 3: ACK2

Node Updates the document if ACK data is provided, and then sends the latest data to node B for those where ACK data was not provided.

Node A -> Node B

 [ { id: 2, timestamp: 12, data: { foo: '23' } }, { id: 3, timestamp: 12, data: { foo: '22' } } ]

Thus, both nodes now have the latest data combined in both directions (in case the client was working offline) - without having to send all your documents.

In your case, your source of truth is your server, but you can easily implement single-user gossip between your clients using WebRTC, for example.

Hope this helps in some way.

Cassandra instructional video

Scylla explanation

Data synchronization through several randomly connected clients using EventSourcing (NodeJS, MongoDB, JSON)

More articles: