Very short answer
The key is architecture. The way is to divide and win. The approach is to relax.
Longer answer
Architecture
The most important component of the building is its architecture. The path of space has walls and floors, windows and ceilings, i.e. Elements of the design itself. The goal of architects is not wall design. He designs them as a secondary part of this work: designing the space that is formed by the walls. We do not build buildings to have walls, we build them to have space inside.
We first develop the functionality that we want from software. Then we move on to the details that allow us to do this, i.e. Create a product. The technologies that we use are not the main ones, these are walls, and we really want this space.
With a good understanding of what we want from the software we create, engineering becomes much simpler and almost automatic. Technical difficulties begin to emerge with well-understood definitions and, fortunately, obvious solutions.
Divide and conquer
One very important indicator of good architecture is that it clearly defines the components of the solution. When we can see these components of the big picture separately, we can divide the work into separate parts. Then we can create separate things for collaboration.
Relax
Perhaps this name sounds like I'm trying to bring some kind of humor to the text; but no, please don't get me wrong. If you have a good architecture, you can relax, so your web servers, database servers, engineers, clients, users and the rest of the world. If this is not the case, then you should return to work on your architecture.
BUT?
Hey, what did I talk about here? These were some paragraphs and three titles, and I did not use a single software term other than software. Such an abstract Peter for guys who have nothing to do all day, but lazy to walk and talk, talk and talk ... We are software people, and we don’t have time for this. Hey, I, Hassan, cut it short ..!
Ok, I'll try; but first of all, let it relax ... then we can take a look at some real examples.
Examples
Let's say we are developing a web publishing service for professionals as well as individuals. Each client will have their own website that will work in our system. These websites may be regular personal websites with very few visitors, or if our business is lucky, some of our other customers may be large publications such as the NY Times.
We need to solve two problems of scalability: scaling our business, our system, when we start more and more customers, launch more and more websites. This is a rather simple problem compared to the second one, which scales one website, as it has more and more visitors, more and more data, more and more applications to work on this data.
We can rewrite the question “how to scale” as “how to divide” to see the solution more clearly. If we can divide something into small pieces, we can scale it by adding more resources to make these parts, increasing horizontally.
We will have data and applications that will work on this data. Say we have one database server and one web server and try to make it scalable.
Thinking of the web servers that we will run for our service; if we do not store data on these machines, they will turn into common, equal components, small data clients that will connect this data with the rest of the world. Due to the fact that our web servers are light, dumb, empty, we can easily get many of them to handle an increasing number of requests.
Well, turning web servers into just stupid proxies is not a smart idea. We need to do something, applications to run. And since web servers are the easiest to multiply in our architecture, we want to do as much as possible on these web servers. We will continue this complex issue under the heading “Smarter Separation” below. Before that, let's see what architecture we currently have in the table and how it is scalable.
We use load balancers to make many web servers work in parallel, as well as divide many websites into groups using DNS (even before the requests get into our system) to several load balancers. For example, a.com, b.com, c.com to load balancer 1, a-very-big-website.com to load balancer 2, ... Each group of load balancers, a set of web servers and a database server makes a separate universe in our system. Now we can have millions of websites and grow our system by adding more of these individual universes without any restrictions. We can serve as many customers with as many websites as our marketing department can offer. Our first problem has already been resolved. How about launching large large large sites?
Separation smarter
Of course, we cannot split a single website into separate universes, as with separate websites; but this does not mean that we cannot separate at all. We will continue to divide and conquer. To do this, we need to consider in more detail the problems that we solve.
What is a website? Web pages that support files such as css and js , multimedia content such as images and video files, and data, a lot of data. Thanks to the CDN and the huge storage services, cloud computing systems provide, static files are no longer an important part of our problem ...
The real thing is web page rendering. We thought above that our web servers are very lightweight, universal interfaces for our database. We have not yet decided how to run applications in our universes. Now it's time to do it.
Each request of our system will be sent to the site and processed by the application launched for this site. The very first thing that our web servers will do is decide which site the request belongs to. On our database server, we save a table that maps host names to sites. On each new client website, we will add one or more domains to this table to match this site. With each request to our web servers, we will query the database server and decide which site to load. Good?
No, it's not good. This is terrible. Why?
Big numbers, small numbers
We have a small number of websites; but a very large number of requests. The number of websites in a universe changes much less frequently than other types of data, such as comments on blog sites. This table is updated, perhaps several times a day, in the established universe. Requesting such a tiny (several thousand records, tiny!) Database for each query again and again all day is not smart. The smart way to do this is to keep copies of this table on web servers and only update after they are updated. How do we know when the list of sites is updated? We can save the row with the number as the number of our table. With each update, we can increase this number. Or we can keep the timestamp of the last update. Web servers can check the database server for this number and compare it with their local versions in memory. If the table is newer, we pull out the data, again overwriting the local copy in memory. Thus, we will reduce thousands of queries to small numbers. Big numbers, small numbers ...
At that moment, what materials we use in our buildings began to make a difference. What languages, which database platforms and systems, etc. Now they matter because they can improve our work. For example, to update a table, our database server may have a mechanism for notifying web servers about the update. Thus, we will go even further and completely remove unnecessary queries in the table of domain sites. Thus, if the systems we have chosen provide such mechanisms, this means that these systems are a good choice for our architecture.
The separation of things in a rational way occurs automatically when we well understand what we want from our software. It is very difficult to scale database servers. Because we need data together. By increasing the number of web servers, we scale horizontally without any restrictions; but for a database server this is not applicable. The database server must have access to the data, and the machines have limitations that we cannot scale effectively.
Each database system provides scalability methods such as scalding or non-shared architecture. There may be time you should use; but, as I see it on forums, blogs and other places, people share their experiences, IMHO, people use it too aggressively and erroneously. They let their databases get bigger and bigger than "hey, it's time to scale, add a few fragments." 99% of all these applications are blind. People throw their problems into software and expect them to be solved like magic. Unfortunately, they very soon realize that there is no magic.
We must stay clear of blind decisions by observing our numbers: large numbers, small numbers. Also, understanding the internal operation of the system and solving problems using architecture, rather than the intensive use of materials.
Here is the architectural solution: Archived solution ( Calatrava ).
Here are other solutions that depend on materials instead of good architecture: [ Blind-run Solution 1 ], [ Blind-run Solution 2 ]
Judge for yourself the differences.
How can we scale the database server? Instead of blindly dividing the tables in the middle, we can revise our data. Can we share user account information with site templates? Sure why not? Can we use different database servers for old and fresh data? A bit more complicated, especially considering the search capabilities; but why not?
Separate mentally, not blindly! I agree that there will be times when you can no longer share it; but let’s how many of us work on Google or Facebook?
- Hey man, we have a very large dataset and when we run ...
- Shush. First go back and check your dataset. Should it really be a large dataset?
In most cases, no, it is not. We just don’t want to admit it ...
How to migrate
Restoring everything from scratch takes time, which many companies cannot afford. The best way is to architect our current system without overwriting each component; but instead, separating and redefining them as components. This is basically an analysis followed by small changes. Each function call in the system can easily be a division point. We can simply cut the system from this point into two parts.
A happy look at your current system for just a few hours will show you a ton of ideas on how to separate these parts. Once you have separated them, it is very easy to reverse engineer everything and then rebuild the new system in parts. If I have a building and I need to build a larger building on one land in order to build a new building without moving all the people who already live, there is a very difficult job; but not impossible. When it comes to software instead of buildings, it's a lot easier. So?
Relax
This is software. He is soft. You can copy your data, do tests on it, delete everything, copy another million times. Once your architecture is well designed, your mistakes never cause catastrophic events. It is very difficult to turn a 6-seat dining table into one that can serve 60 guests; but software ... software, and we can easily do such things. Relax.
- The above question concerns an area that cannot be covered in just a few paragraphs. Based on this part of the question: "However, I am still open to more general improvements, and not to those who are suitable for a wider audience." I tried to mention things in a general format, without going into details. Although I tried to give some tiny examples of practical applications of my principles, I know that in this short text I left a lot of open goals. I appreciate any criticism and questions in the comments.