Erlang fail-safe application: PA or CA CAP?

I already asked a question regarding a simple fail-safe soft real-time web application for a pizza delivery store.

enter image description here

I have wonderful comments and answers, but I do not agree that this is a real web service. Instead of a web service, it is rather a real-time system that allows you to receive orders from customers, control the distribution of these orders and control the vehicles that deliver these orders in real time.

In addition, unlike the "true" web service, this system is not intended for many users - these are just a few dispatchers (telephone service providers) and several delivery drivers who will use it (at the moment I have no requirements to provide direct access to services for real customers, only dispatchers and delivery drivers will have direct access).

Therefore, this question is a little more general.

I found that in order to make the right choice for the NoSQL data storage option for this application, the first thing I need to do is make a choice between CA , PA and CP in accordance with the CAP theorem .

Now building Web applications with Erlang's book says, "while [Mnesia] is not a SQL database, it is a CA database, such as an SQL database. It will not process a network partition." The same book says that the CouchDB database is a PA database.

With that in mind, I think the very first thing I need to do with my application is to decide what the term “fault tolerance” means with respect to CAP.

The simple requirement I have is to have an available 24/7 (R1) application. Another is that there is no need to scale, the application will have a very modest number of users (it may not be possible to have thousands of dispatchers) (R2).

Now, is R1 required for the application to ensure consistency, accessibility, and partition tolerance and with what priorities?

What type of data warehouse will better cope with the following problems:

  • Ensuring round-the-clock availability of the dispatcher (the person who receives phone calls from customers and uses CRM) to search for customer records and enter orders into the system;
  • Search for current current serviced orders and their status (placement, baking, shipping, delivery, delivery) in real time;
  • Keep track of the location of all working vehicles and their payload in real time;
  • Recover any part of the system after a system crash or network failure to continue providing 1,2 and 3;

To summarize: which data warehouse (CA, PA, or CP) is best suited for the system described above? Which data warehouse better meets the requirements of R1?

+6
source share
2 answers
  • For your 24 / requirement, you are looking for a (high) availability database because you want your queries to be successful every time (even if these are just error results).
  • Netsplit will delete your entire system if you do not have permission to partition.
  • Consistency is nice, but you can only have 2 out of 3.

Your best bet is the PA solution. I highly recommend the solution that inspired Amazon Dynamo. The most famous dynamo implementations are riak and couchdb. Riak even allows you to modify PA in some other way by setting up read and write replicas.

+3
source

First, do not confuse CAP “Accessibility” with “High Availability”. They have nothing to do with each other. A in CAP simply means: "All database nodes can respond to requests." To get high availability, you must be in several data centers, you must have reliable documented procedures for maintenance, expansion, etc. None of this depends on your choice of CAP.

Secondly, be realistic about your requirements. A stock trading application may require 100% uptime, as every second of downtime can lose millions of dollars. On the other hand, I assume that your pizza soup could lose tens of dollars every minute. Therefore, it makes no sense to spend millions trying to save it. Try to calculate your actual costs.

Thirdly, always evaluate your choice against the main one. You can simply switch to CA (MySQL) and quickly switch to slaves in case of problems. Be realistic about the costs (and risks) of building new technologies. If you really expect your system to run for 5 years without downtime, ask for confirmation that someone else has run this database for 5 years without downtime.

If you go to "AP" and you have remote people (drivers, etc.), you will need to write an application that stores its data on your phone and sends it in the background (with repetitions). Of course, you can do this regardless of the weather in which your database was CA or AP.

If you need high periods of time, you can:

  • Increase MTBF (Mean Time Between Failures) - buy redundant power supplies, buy dual Ethernet cards, etc.

  • Decrease MTTR (average recovery time). Just make sure that if you fail, you can recover quickly. (Error for slave)

I saw people spend tens of thousands of dollars on MTBF just to sleep for 8 hours while they restore their backup. It makes sense to ensure that the MTTR is low before attacking the MTBF.

0
source

Source: https://habr.com/ru/post/923422/


All Articles