How far can you go with a “possible” sequence and no transaction (aka SimpleDB)?

Question

How far can you go with a “possible” sequence and no transaction (aka SimpleDB)?

I really want to use SimpleDB, but I'm worried that without real locking and transactions, the whole system is deadly wrong. I understand that for high read / low write applications this makes sense as the system eventually becomes consistent, but what about the time between them? It seems that the correct query in inconsistent db will perpetuate the chaos across the entire database in a way that is very difficult to track. I hope I'm just worried ...

+6

sql database amazon-web-services amazon-simpledb

capotej Nov 29 '08 at 4:36

source share

2 answers

Assuming you say this SimpleDB , you are not worried; there are real reasons not to use it as a real-world DBMS.

The properties that you get from transaction support in a DBMS can be abbreviated as ACID: Atomicity, Consistency, Isolation, and Durability. A and D are mainly related to system failures, while C and I are related to regular work. They are everything that people completely take for granted when working with commercial databases, so if you are working with a database that does not have one or more of them, you may encounter any number of unpleasant surprises.

Atomicity . Any transaction will be fully or completely completed (i.e. will be either committed or terminated). This applies to individual statements (for example, "UPDATE Table ..."), as well as to longer and more complex transactions. If you don’t have this, then everything that goes wrong (for example, a full disk, a computer crash, etc.) may leave something halfway. In other words, you can never rely on the DBMS to really do what you are talking about, because any number of real problems can interfere, and even a simple UPDATE statement can be partially completed.

Consistency Any rules that you have configured for the database will always be respected. For example, if you have a rule that says that A is always equal to B, then nothing that anyone does with the database system can violate this rule - he will not be able to perform any operation. It's not that important if all your code is perfect ... but really, when is that? Plus, if you miss this security system, everything becomes really yucky when you lose ...

Isolation . Any actions performed in the database will be performed as if they were performed alternately (one at a time), even if in reality they occur simultaneously (alternate with each other). If more than one user gets into this database at the same time, and you don’t have them, then everything you can’t even think of will go wrong; even atomic applications can interact with each other in unforeseen ways and spin things.

Longevity . If you lose power or a software crash, what happens to the database transactions that were in progress? If you have strength, the answer is "nothing - they are all safe." Databases do this using something called Undo / Re-Log, where every little thing you do in the database is first registered (usually on a separate drive for security) so that you can restore the current state after failure. Without this, the other properties above seem useless, because you can never be 100% sure that after the crash the situation will remain consistent.

Is any of this important to you? The answer has everything related to the types of transactions that you do, and what guarantees you want in a failure situation. There may be cases (for example, a read-only database) where you do not need it, but as soon as you start to do something non-trivial and something bad, you will want them to be. Perhaps this is normal if you just go back to the backup anytime something unexpected happens, but I assume that it is not.

Also note that removing all of these protections does not make it so that your database will work better; in fact, it is probably the other way around. This is because real DBMS software also has tons of code to optimize query performance. Thus, if you are writing a query that combines 6 tables in SimpleDB, do not assume that it will find the best way to run this query - you can end the waiting time to complete it when a commercial DBMS can use an indexed hash connection and get it at 0 , 5 seconds. There are a million little tricks you can do to optimize query performance, and believe me, you really will miss them when they leave.

None of this means knocking on SimpleDB; take it from the author of the software : “Although this is a great learning tool, I can't imagine anyone wanting to use it for anything else.”

0

Ian varley Nov 29 '08 at 13:39

source share

cliff.meyers · Accepted Answer · 2008-11-29T08:22:04+0000

This is a pretty classic battle between consistency and scalability and - to some extent - accessibility. Some data does not always have to be consistent. For example, look at digg.com and the number of diggs versus history. There is a good chance that the value is duplicated in the digg record, and not forcing the database to connect to the user_digg table. Does it matter if this number is not entirely accurate? Probably no. Then using something like SimpleDB might be good. However, if you are writing a banking system, you should probably value consistency above all. :)

If you do not know from day one that you have to deal with mass scale, I would stick to simple more traditional systems such as RDBMS. If you work somewhere with a reasonable business model, you hopefully see a big surge in revenue if there is a big surge in traffic. You can then use this money to help solve the problems of scaling. Scaling is tough and scaling is hard to predict. Most of the scaling problems that hurt you will be the ones you never expect.

I would rather get the site from scratch and spend several weeks fixing the scale problems when the traffic is being taken, and then spend so much time that we will not release it because we are running out of money. :)

How far can you go with a “possible” sequence and no transaction (aka SimpleDB)?

More articles: