Does it make sense to use stateful web servers?

I am working on a web application that has historically been built on the PHP / MySQL stack.

One of the key operations of the application was to perform some heavy calculations, requiring the repetition of each row of the entire database table. Needless to say, this was a serious bottleneck. Therefore, it was decided to rewrite the entire process in Java.

This gave us two advantages. First, Java, as a language, was much faster than the PHP process. Secondly, we could support the entire data set in the memory of the Java application server. So, now we can do heavy calculations in memory, and everything happens much faster.

This worked for a while until we realized that we needed to scale, so we now need more web servers.

The problem is that according to the current scheme, all of them must maintain the same state. All of them query the database, process the data and store them in memory. But what happens when you need to change this data? How do all servers maintain consistency?

This architecture seems to me wrong. The benefit of storing all the data in memory is obvious, but it makes scalability very difficult.

What are the options from here? Switch to data storage in memory, key value? Should we completely abandon the storage state inside the web servers?

+8
web-applications architecture scalability
source share
4 answers

now switch to Erlang :-)

Yes, this is a joke; but there is some truth. the problem is that you initially had your state in an external shared storage: DB. now you have (partially) pre-calculated in an internal non-shared repository: Java RAM objects. The obvious way is to still have it pre-calculated, but in external shared storage, the faster the better.

One simple answer memcached.

Another way is to create your own "calc" server, which centralizes both the calculation task and the (partial) results. The web interface handles access to this server. In Erlang, this would be a natural way to do this. In other languages ​​you can do this, just more work. Check out ZeroMQ for inspiration, even if you don't use it at the end (but it's a damn good implementation).

+4
source share

It may be cliche, but the data is always expanded to fill the space in which you have enclosed it. Now your data can be stored in memory, but I guarantee that it will not be in the future. How far this time is, you need to figure out the best architecture. The completeness of your application is just a symptom of this big problem.

Are all the different calculations in the whole data set? Is this something you can do in a party at night and have people during the day? How sensitive is it to time?

I think these are the questions you need to answer, because at some point you won’t be able to buy enough memory to store the data you need. It may seem silly if you are now, but you should plan that it is true. Many of the developers I spoke to do not think about what success looks like and what impact it has on their projects.

+1
source share

I agree with you - this does not sound correct, but I need to know for sure in more detail.

You mentioned a large data set and heavy calculations, but you are not talking about how the data is updated when the calculations are performed, whether it is day by day or the whole data set, etc. It sounds a lot like a batch job that you can do daily offline.

If this is the case, I'm not sure where the network is tied. Do your web users simply execute user requests after the crunch is complete? Is the data read-only or read-only - mainly for users? Or do they constantly change data on the fly?

I wonder if persistence technology affects you? Perhaps the NoSQL alternative might be better for your problem - like a distributed MongoDB cluster.

+1
source share

This, in my opinion, is a question with the data engine, since it is a question about the distribution of the web server. Why doesn't your (central) database engine perform the calculations (fast enough)?

You can save pre-calculated values ​​that are marked as obsolete when changing underlying data that requires recalc. There is no way around the need for recounts when changing data. You just need to control when and how the change occurs, as it will affect data consumers.

+1
source share

Source: https://habr.com/ru/post/650554/


All Articles