StarCounter and CAP

Question

StarCounter and CAP

I read about the Starcounter database. He claims that he can handle loads that only the NoSql database can handle without compromising consistency. As far as I understand the CAP theorem, if you maintain consistency, you lose the availability or validity of partitions. So what trick makes StarCounter work?

I can imagine that StarCounter is fast, but the statement that NoSql needs to be reduced in order to keep up seems a bit strange to me. Can someone explain?

Thanks in advance Roland

+8

nosql cap starcounter

Rolling Oct 15 '12 at 6:30

source share

2 answers

There is no "trick". Starcounter talks about speed, and CAP / NoSQL talks about scalability. There is a tradeoff between features + scalability and speed.

Sometimes it’s okay to ignore scalability if you can prove that there are bottlenecks in other places. For example, a new launch does not have to worry about their website being scaled to a million users, they should worry about getting their first hundred users. (Does anyone remember how often Twitter was in the early days?) Starcounter can be useful if their transaction speed is much higher than the speed of your website.

On the other hand, I do not trust anyone who combines all the NoSQL databases together. Different NoSQL databases are different from each other. They have fundamentally different architectures and properties. Some of them scale to thousands of nodes, some of them do not scale outside of one node. Sometimes adding scalability slows you down. Sometimes removing features speeds up your work.

http://strata.oreilly.com/2010/12/strata-gems-mysql-handlersocket.html

0

BraveNewCurrency Jan 26 '13 at 15:09

source share

Jack wester · Accepted Answer · 2012-10-15T15:36:16+0000

Short answer

The CAP theorem (in other words, the Brewers theorem) cannot be beaten for one piece of information (for example, a consistent database). If you have a horizontally scalable database, you will not get consistency and . This conclusion proceeds from the laws of physics and can be subtracted from the Brewers' theorem and Einstein's theories of relativity. You need to scale / paste, not from. Not very “cloudy,” but as Galileo’s enemies would probably admit if they were alive today, nature does a poor job in honor of human fashion.

Scaling Consistent Data

I'm sure there are other approaches, but Starcounter works by hosting a database image in RAM. Instead of moving database data to application code, parts of the application code are moved to the database. Only the data in the final answer is moved from the original location to RAM (where the data was in the first place). This leads to the fact that most of the data is stored, even if millions of requests are processed every second. The disadvantage is that the database must know the programming language of your application logic. However, the growth potential is obvious if you have ever tried to serve millions of HTTP requests / sec, each of which requires wide access to the database.

One more answer

The question is good. It is not surprising that it seems strange to you, as it was only a few years ago, that CAP was proved (turned into a theorem). Many developers are as disappointed as a child when a theoretical physicist tells him to stop looking for a perpetual motion machine because it cannot work. We still want a consistent database, right?

CAP Theorem

The CAP theorem gives that any piece of information cannot have consistency (C), availability (A) and admissibility of a partition (P). It applies to a unit of information (for example, a database). Of course, you can have independent information that works in different ways. One part may be AP, the other may be CA, and the third may be CP. You simply cannot have the same information as CAP.

The problem of the impossibility of “P” in a consistent and accessible database causes the scaled database to MUST signal between nodes. The conclusion should be that even after a hundred years, CAP gives that a single fragment of the agreed data will have to live on equipment connected using hard wires or light beams.

Problem with P in CAP

The issue is performance if you apply horizontal scaling to an accessible consistent database. A good view was the very reason for horizontal scaling, in the first place, it is very bad. Since each node must interact with other nodes whenever there is access to the database to achieve consistency, and given the fact that the signaling is ultimately limited by the speed of light, you remain a sad, but true, fact that the scientist of the database (as well CPU scientists) are not just stubborn in order not to see the scale, like a magic silver bullet. This will not happen because it cannot happen (however, parts of your database can be placed in the AP set, so remember, we are talking about consistent data here). Adding Einstein's theories to the CAP theorem and a small victory in the cloud data center for consistent data.

Infinite machines and CAP

The state of things in the database community is a bit like the state of ever-moving machines when a horse and a carriage are a way to get to work. Without any theoretical evidence against this, patent offices granted hundreds of patents for impossible perpetual machines. Today we can laugh at it, but we have similar situations in the database industry with sequential scalable databases. When you hear that someone claims to have an ACID master database, be careful. Only after the target maths from dot com at MIT proved that Brewer was officially born directly in the CAP theorem, so the hunt for the impossible, unfortunately, has not died yet. You can compare this, if you want, with how the laggards tried to invent the eternal machine for many years after modern theoretical physics should reasonably put an end to it. Old habits die hard (my apologies to any of them still make drawings of bearings and hands moving in the ad on their own - I don't want to be offensive).

CAP and performance

All is not lost. Not all pieces of information should be consistent. Not all parts need to be scaled. You have only the brewers acceptance theorem and do your best.

For applications such as Facebook, the sequence is not set. This is normal since data is entered once and then managed by one user. However, we may experience side effects in the daily use of Facebook, for example, about things that appear and disappear for a while.

However, in most business applications, the data must be correct. The sum of all invoices in your financial statements must be zero. Your inventory should be equal to 8 if you sold 2 out of 10 items, even if there are several users who buy the same product.

The problem with scaling available data is that you need to get by without access to the section. This fantastic word simply means that you must constantly signal between the nodes of your cloud. And since it takes a few nanoseconds to go one meter, it becomes impossible without making your large-scale result in lower productivity, but not in higher productivity. Of course, this is true only for consistent data. The consequences of this were known to engineers of Intel, AMD, Oracle et. During a long time. Not their scientist had heard of the scale. They just came to accept the world, as Einstein described.

Some comfort in the darkness

If you do the math, you will find that on one PC there are instructions to save on every person who lives on Earth every second when he works (google on a “modern processor” and “MIPS”). If you do one more math, for example, taking the total turnover of Amazon.com (you can find it at wwww.nasdaq.com) divided by the price of the average book, you will find that the total number of sales transactions can fit in the RAM of one modern PC. The most interesting thing is that in 2012 the number of items, customers, orders, products, etc. It occupies the same space as in 1950. Images, video and audio have increased in size, but numerical and textual information does not grow by a point. Of course, the number of transactions is growing, but not in the same phase as the power of a computer grows. Therefore, the logical solution is to scale the read-only data and the AP data and the up / down business data.

"Scaling" instead of "scaling"

Database engines and business logic running in a virtual machine (such as Java VM or .NET CLR) typically use fairly efficient machine code. This means that moving memory is obscuring the bottleneck of overall throughput for a consistent database. This is often referred to as a memory wall (wikipedia has some useful information).

The trick is to transfer the code to the database image instead of the data from the database image to the code (if the MVC or MVVM template is used). This means that the consumer code runs in the same address space as the database image, and the data never moves (and the disk simply protects transactions and images). Data can remain in the original image of the database and should not be copied to the application memory. Instead of treating the database as a RAM database, the database is considered as primary memory. Everything remains.

Only data that is part of the user's final response is moved from the database image. For large-scale applications with hundreds of millions of concurrent users, this usually amounts to only a few million requests per second, that there is no processing problem on one PC, given that the HTTP packet is executed on the gateway servers. Fortunately, these servers are highly scalable because they do not need to exchange data.

As it turned out, the drive works during sequential recording, so a raid disk can save terabytes or changes every minute.

Horizontal Scaling in Starcounter

Usually you do not scale Starcounter node. It scales, not out. This works well for several million users at a time. To go higher, you need to add more Starcounter nodes. They can be used to separate data (but then you lose consistency, and Starcounter is not intended for separation, so it is less elegant than solutions like Volt DB). Therefore, the best alternative is to use additional Starcounter nodes as gateway servers. These servers simply accumulate all incoming HTTP requests in a millisecond at a time. This may seem like a short time, but accumulating thousands of queries is enough if you decide that you need to scale Starcounter. The request packet is then sent to the ZLATAN node (Zero LATency Atomicity Node) a thousand times per second. Each such batch can contain thousands of requests. Thus, several hundred million user sessions can be served by a single ZLATAN node. Although you can have multiple ZLATAN nodes, there is only one active ZLATAN node at a time. So the CAP theorem is observed. To go beyond this, you need to consider the same tradeoff as Facebook and others.

Another important note: ZLATAN node does not serve data applications. Instead, the application controller code is launched by the ZLATAN node. The cost of serializing / deserializing and sending data to the application is much more than for processing the logical cycles of the controller. That is, the code is sent to the database, and not vice versa (the traditional approach is that applications request data or send data).

Make "shared-all" node faster by doing less

Using the database as a heap for a programming language instead of a remote system for serializing and deserializing is the trick Starcounter calls VMDBMS. If the database is in RAM, you should not move data from one place in RAM to another place in RAM, which is the case in most RAM databases.

StarCounter and CAP

Short answer

Scaling Consistent Data

One more answer

CAP Theorem

Problem with P in CAP

Infinite machines and CAP

CAP and performance

Some comfort in the darkness

"Scaling" instead of "scaling"

Horizontal Scaling in Starcounter

Make "shared-all" node faster by doing less

More articles: