What is better in database design?

Given a site like StackOverflow, would it be better to create a num_comments column to store the number of comments that are in the view, and then update them when creating the comment, or simply request the number of lines using the COUNT function? It seems that the latter would be more readable and elegant, but the former would be more effective. What does SO think?

+7
source share
5 answers

Definitely use COUNT. Saving comments is a classic de-normalization that causes headaches. This is a little more efficient to extract, but makes inserts much more expensive: each new comment requires not only inserting into the comments table, but also locking the record in the line containing the comment counter.

+7
source

The first is not normalized, but will provide better performance (provided that it will read more than it writes).

The latter is more normalized, but will require more resources and, therefore, will be less productive.

It comes down to application requirements better.

+3
source

I would suggest counting comment entries. Although the other method will be faster, it provides a cleaner database. Adding the count column will be a kind of duplication of data, not to mention the need for an additional code step and insertion.

If you expect millions of comments, then you may need to choose the count column approach.

+2
source

I agree with @Oded. It depends on the requirements of the application, as well as how active the site is, however here are my two cents

  • I would try to avoid the entries that must be done with the UPDATES triggers to post the table when adding new comments.
  • If you are concerned about data reporting, then do not do this in a transactional system. Create a reporting database and periodically update it.
+2
source

The “right” way to design is to use another table, join it, and COUNT . This is consistent with what database normalization teaches.

The problem with normalization is that it cannot scale. There are so many ways to drop a cat, so if you have millions of queries per day, and many of them are related to table X, the database performance goes below ground level, as the server also has to deal with simultaneous records, transactions, etc. d ..

To deal with this problem, the general practice of sharding . Sharding has a side effect that table rows are not stored in the same physical location, and the main consequence of this is that you can no longer JOIN ; how can you JOIN against half the table and get meaningful results? And, obviously, trying JOIN to all sections of the table and merging the results will be worse than the disease.

So, you see that not only the alternative that you are studying is used in practice to achieve high performance, but also the more radical steps that engineers can and do.

Of course, if you have no performance issues, scalding or even de-normalizing just makes life difficult without any tangible benefits.

+2
source

All Articles