How to properly handle asynchronous database replication?

I am considering using Amazon RDS with read replicas to scale our database.

Some of our controllers in our web application are read / write, some of them are read-only. We already have an automated way to determine which controllers are read-only, so my first approach would be to open a connection to the wizard when a read / write controller is requested, otherwise open a connection to a read replica when a read request is only a controller.

In theory, that sounds good. But then I came across the concept of replication delay, which basically suggests that the replica can lag behind the master in a few seconds.

Imagine the following usage example:

  • The browser is sent to /create-account , which reads / writes, thereby connecting to the wizard
  • An account is created, the transaction is completed, and the browser is redirected to /member-area
  • The browser opens /member-area , which is read-only, and thus connects to the replica. If the replica is even slightly behind the wizard, the user account may not yet exist on the replica, resulting in an error.

How do you actually use read replicas in your application to avoid these potential problems?

+7
source share
2 answers

This is a complex problem and there are many potential solutions. One possible solution is to view what facebook has done ,

TL; DR - read requests are transferred to the read-only copy, but if you record, then within the next 20 seconds all your reads go to the recorded owner.

Another major problem that we had to solve was that only our California database wizard can accept write operations. This fact meant we had to avoid using the pages on which the Virginia database was recorded, because everyone had to cross the country into our master database in California. Fortunately, our most frequently accessed pages (homepage, profiles, photos) are not recorded during normal operation. Thus, the problem was that the user makes a request to the page, how do we solve if it is β€œsafe” for sending to Virginia or if it should be sent to California?

This question has a relatively simple answer. One of the first servers a user requests for Facebook hits is called a load balancer; this main responsibility of the machine is to choose a network server for processing the request, but it also serves a number of other purposes: protection against denial of service attacks and multiplexing user connections, to name a few. This load balancer is able to run in 7th level mode, where it can examine the URI and the user requests and makes routing decisions based on this information. This feature meant it was easy to calculate load balancing about our β€œsafe” pages, and he could decide whether to send a request to Virginia or California based on the page name and location, etc.

However, there is another problem with this problem. Say you go to editprofile.php to change your hometown. This page is not marked as so it will be redirected to California and you will make changes. Then you go to view your profile, and since this is a safe page, we will send you to Virginia. Due to the lag in replication we talked about earlier, however, you may not see the changes you just made! This experience is very confusing for the user, and also leads to double publishing. We have around this problem by setting a cookie in our browser using the current time when you write something in our databases. The load balancer also looks for this cookie and, if he notices that you wrote something in 20 seconds, unconditionally sends you to California. Then, when 20 seconds have passed and we are sure that the data is being copied to Virginia, we will allow you to return to a safe place on the page.

+1
source

I worked with an application that used pseudo- vertical splitting . Since only a small amount of data was time sensitive, an application is usually retrieved from slaves and from master only in specific cases.

As an example: when a user updates his application for a password, he always asks for a wizard for authentication. When changing non-time data (for example, "User Preferences"), it displays a success dialog and information that may take some time until everything is updated.

Some other ideas that may or may not work depending on the environment:

  • After updating the checksum of the entity entity, save it in the application cache and when the data selection always asks for the checksum to match
  • Use browser storage / cookie to store delta collateral. The user always sees the latest version.
  • Add an β€œupdated” flag and cancel synchronously on each node slave before / after the update

Regardless of which solution you choose, keep in mind the object of the CAP theorem .

+2
source

All Articles