Microservice Data Consistency

While each microservice will usually have its own data, certain objects must be coordinated between several services.

For such a requirement for data consistency in a highly distributed landscape such as microservice architecture, what are the design options? Of course, I don’t need a common database architecture where one database manages the state in all services. This violates the principles of isolation and the general principle.

I understand that a microservice can publish an event when an entity is created, updated, or deleted. All other microservices that are interested in this event can accordingly update related objects in their respective databases.

This works, but it leads to a lot of careful and coordinated programming efforts in services.

Can Akka or any other infrastructure solve this use case? How?

EDIT1:
Adding a chart below for clarity.
Basically, I am trying to understand if today there are frameworks available that can solve this problem of data consistency.

For the queue, I can use any AMQP software such as RabbitMQ or Qpid, etc. For a data consistency framework, I'm not sure if Akka or any other software can currently help. Or is this scenario so unusual and an anti-pattern that no structures will ever need?
enter image description here

+24
design-patterns akka microservices
source share
6 answers

Microservices architectural style allows organizations to have small teams whose in-house services are independent in design and runtime. See read . And the hardest part is to define the boundaries of the service in a useful way. When you find that the way you split your application leads to the fact that requirements often affect several services that tell you to redefine the boundaries of the service. The same can be said about when you feel a strong need to exchange data between services.

Thus, the general advice would be to try very hard to avoid such scenarios. However, there may be times when you cannot avoid this. Since good architecture often makes the right compromises, there are some ideas here.

  • Consider a dependency expression using service interfaces (APIs) instead of a direct database dependency. This will allow each service team to change its internal data schema as much as required, and only worry about the interface design when it comes to dependencies. This is useful because it’s easier to add additional APIs and slowly depreciate the old APIs instead of changing the design of the database along with all the dependent Microservices (possibly at the same time). In other words, you can still deploy new versions of Microservice independently if the old APIs are still supported. This is the approach recommended by Amazon's CTO, who has pioneered many Microservices approaches. It is recommended that you read the Kafka interview.

+8
source share

Theoretical limitations

One important point to keep in mind is the CAP theorem :

If there is a section, one of them has two options: consistency or accessibility. When choosing the availability sequence, the system will return an error or timeout if specific information cannot be guaranteed due to network sharing.

Thus, by “requiring” certain entities to be consistent across multiple services, you increase the likelihood that you will have to resolve timeout problems.

Distributed Akka Data

Akka has a distributed data module for exchanging information in a cluster:

All data records are distributed to all nodes or nodes with a specific role in the cluster through direct replication and gossip distribution. You have fine-grained control over the level of consistency for reading and writing.

+4
source share

Same problem here. We have data in different microservices, and in some cases one service needs to know if a particular entity exists in another microservice. We do not want the services to call each other to fulfill the request, as this increases the response time and increases the downtime. It also adds a nightmare to grip depth. The client also does not have to make decisions about business logic and data verification / consistency. We also do not want central services, such as Saga Controllers, to ensure consistency between services.

Therefore, we use the Kafka messaging bus to inform observing services of state changes in the upstream services. We try very hard not to skip or ignore any messages even under error conditions, and we use the template of the “tolerant reader” Martin Fowler to connect them as freely as possible. However, sometimes the services change, and after the change they may need information from other services that they may have sent to the bus before, but now they are gone (even Kafka cannot be stored forever).

At the moment, we decided that each Service should be divided into a clean and separate web service (RESTful), which does the real work, and a separate Connector-Service, which listens on the bus and can also call other services. This connector is running in the background. This is only caused by bus services. He will then try to add data to the main service using REST calls. If the service responds with a consistency error, the connector will try to fix this by retrieving the necessary data from the upstream service and injecting it as needed. (We cannot allow batch jobs to “synchronize” the data in the block, so we just select what we need). If there are better ideas, we are always open, but “pulling” or “just changing the data model” is not what we consider feasible ...

+2
source share

I think there are two main forces here:

  • denouement - this is why you have microservices in the first place and you want to use a common approach to data storage
  • the requirement of consistency - if I understood correctly, you are already in order with possible consistency

The diagram makes perfect sense for me, but I don’t know of any frameworks to do this out of the box, probably due to the large number of trade-offs associated with a particular case. I would approach the problem as follows:

The upstream service emits events to the message bus, as you showed. For serialization, I carefully chose the wiring format, which is not too much connected with the manufacturer and the consumer. I know this is protobuf and avro. You can expand your event model upstream without changing the downstream if it does not care about the newly added fields and can perform a rolling update if this happens.

The following services subscribe to events - the message bus must provide fault tolerance. We use kafka for this, but since you chose AMQP, I assume that it gives you what you need.

In case of network failures (for example, a downstream consumer cannot connect to a broker), if you prefer (ultimately) consistency in availability, you can refuse to service requests that rely on data that, as you know, may be more outdated than some pre-configured threshold value.

+1
source share

I think you can approach this problem from two perspectives, collaboration and data modeling:

Service Collaboration

Here you can choose between service orchestration and service choreography. You have already mentioned messaging or events between services. This will be a choreographic approach, which, as you said, can work, but includes writing code in each service that deals with the messaging part. I am sure there are libraries for this. Or you can choose a service orchestration, where you introduce a new composite service - an orchestra, which can be responsible for managing data updates between services. Since data consistency management is now retrieved into a separate component, this allows you to switch between possible consistency and strong consistency without touching downstream services.

Data modeling

You can also choose to redesign data models for participating microservices and extract entities that need to be consistent across multiple services into relationships controlled by dedicated microservice communications. Such a microservice will be somewhat similar to an orchestra, but the connection will be reduced, because relationships can be modeled in a common way.

0
source share

“update related objects in their respective databases accordingly” → data duplication → FAIL.

Using events to update other databases is identical to caching, which leads to a cache consistency issue, which is the issue that arises in your question.

Keep your local databases as separated as possible and use pull semantics instead of push, i.e. make RPC calls when you need data, and be prepared to gracefully handle possible errors, such as timeouts, missing data, or service unavailability. Akka or Finagl provides enough tools for this.

This approach can hurt performance, but at least you can choose what to trade and where. Possible ways to reduce latency and increase throughput:

  • scaled data services so that they can handle more requests / sec with lower latency.
  • Use local caches with a short expiration time. This will provide ultimate consistency, but really help in performance.
  • Direct access to distributed cache and face cache matching issue
-one
source share

All Articles