Simple question table design

Question

Simple question table design

I try to think a little and avoid unnecessary pain, if possible.

I had this problem in past applications and, as a rule, chose the most correct approach, but I would like the opinions of several others.

If you have a base table, such as below, is it wise and / or more efficient to include a field that includes a calculation from information that can be found from the other two columns. IE:

+-----+---------+------------+-------+--------+-------+ | id | room_id | bookdate | price | people | total | +-----+---------+------------+-------+--------+-------+ | 414 | 132 | 2010-03-01 | 14.55 | 2 | 29.10 | | 415 | 132 | 2010-03-02 | 14.55 | 2 | 29.10 | | 416 | 132 | 2010-03-03 | 14.55 | 2 | 29.10 | +-----+---------+------------+-------+--------+-------+

The information in the last field can be extracted from the product of the previous two, so it is redundant and unnecessary. Are there any cases where it can still be useful?

+7

sql ruby-on-rails postgresql database-design

holden Feb 02 '10 at 17:04

source share

11 answers

Maybe create a table containing all the fields except the last, and then create a view containing all the fields and automatically read the last fields?

Thus, the table will contain only these fields

 +-----+---------+------------+-------+--------+ | id | room_id | bookdate | price | people | +-----+---------+------------+-------+--------+ | 414 | 132 | 2010-03-01 | 14.55 | 2 |

And defining a view that calculates the total is also very simple:

 select *, price*people as total from rooms

(assuming your table is called rooms

+4

Gacek Feb 02 '10 at 17:07

source share

The general rule is that you should not store what you can easily calculate, but if you identify this area as a performance bottleneck, you can profile your application, not guess, and then do it.

+2

John topley Feb 02 '10 at 17:08

source share

If you decide to denormalize read performance, you can add a control constraint to ensure consistency.

 create table rooms ( price numeric, people numeric, total numeric check (total=price*people));

This will add a little overhead to insert and update.

+2

cope360 Feb 02 '10 at 23:48

source share

I often advocate for a calculated field, assuming that you are doing it right by defining the field in the database as calculated. Thus, the calculation is always applied regardless of data changes. I would only do this because you would need to get these calculations in reports containing many records. It is confidently easy to write a formula in a query, but if you often calculate this number, you waste server resources (the calculated field only does a deduction when changing information) and, possibly, seriously slows down the query if it has to do calc for millions of records for reports. A materialized point of view is also a good idea (because it will be pre-accumulated), but a regular view will simply make you not write a block several times, but it does not have the performance of the calculated field. On the other hand, I never create views if I don’t need it (I can solve the problem in another way), since they can lead you to a real performance problem when people start creating views on top of the views. Do not use a hammer when a screwdriver is required.

Computed fields are powerful tools when used correctly and are often ignored by database developers.

+2

Hlgem Feb 03 '10 at 15:35

source share

If you do this for convenience when writing queries, I would create a view that includes the total.

Otherwise, it is a matter of normalization . Sometimes denormalizing a table is acceptable. Denormalization , especially in environments such as data warehousing, can be used to improve performance. However, it is important to ensure that your data remains consistent. In other words, you need to update the total field when changing price or people .

In practice, I consider this the last resort that can be used only when other performance optimizations are not justified. In addition, denormalization does not guarantee improvement - depending on the amount of data and other factors, this can actually make the situation worse.

Note: the table cannot be 3NF (third normal form) until the calculated fields are deleted.

+1

Andy west Feb 02 '10 at 17:11

source share

If you are concerned about the choice of performance (at least with WHERE total = xx.xx), you can simply add an index.

CREATE INDEX booking_total About booking ((price * person));

This will change the query plan for SELECT * from booking where price*people = 58.2; from this:

Seq Scan on booking (cost=0.00..299.96 rows=60 width=24) (actual time=0.015..2.926 rows=1 loops=1) Filter: ((price * (people)::double precision) = 58.2::double precision) Total runtime: 2.947 ms

to that

Bitmap Heap Scan on booking (cost=4.30..20.83 rows=5 width=24) (actual time=0.016..0.016 rows=1 loops=1) Recheck Cond: ((price * (people)::double precision) = 58.2::double precision) -> Bitmap Index Scan on booking_total (cost=0.00..4.29 rows=5 width=0) (actual time=0.009..0.009 rows=1 loops=1) Index Cond: ((price * (people)::double precision) = 58.2::double precision) Total runtime: 0.044 ms

PostgreSQL errors :-)

+1

Klette Feb 03 '10 at 15:22

source share

I would go and put the TOTAL field. From what I see here, there is no “DISCOUNT” or a similar field that could reduce the total number, but I can imagine scenarios in which the price * number of people may not coincide with the total amount. You might want to consider the COMMENTS field or even a table so that someone can see why the amount does not match the product of other fields.

Share and enjoy.

0

Bob jarvis Feb 02 '10 at 17:08

source share

Basically, I prefer not to have a “common” field or any field that is calculated by other fields, rather than in the same table or other tables. If the price field changes, someone may “forget” to update the full field, and you will receive incorrect data.

Using this SELECT field is very simple: SELECT price, people, (price * people) AS total FROM some_table;

The only case, I believe that maintaining the calculated field is when it takes a long time to calculate it, and it will overload the database with a huge amount of data.

BR

0

aviv Feb 02 '10 at 17:09

source share

It is generally considered improper practice to store fields that can simply be calculated from other fields in your table. The only time I would recommend this is when you need to save the result of a complex calculation, and it is easier to save the calculated value than to recalculate the value each time, but in your case this does not seem necessary.

Another problem with calculated fields is that the initial values used for the calculation can be changed without changing the saved result, which creates potential problems in your application.

0

James goodwin Feb 02 '10 at 17:10

source share

How can you calculate the value - quite simply in this case - it is redundant. You will almost never store redundant data. This means that in every place where you update the price or people, you must definitely update the total quantity. If you forget to do this even in one place, the data is now incompatible. Suppose you now have an entry that says price = $ 10, people = 3, total = $ 40. If you have different programs that display information in different ways - different results or subsets or something else - the user may receive different answers to the same question depending on how he asked it. Although it’s bad to get the wrong answer, it’s even worse when sometimes you get the right answer, and sometimes the wrong answer, because then it may not be clear how to fix the problem. I mean, if I see a certain client showing 2 people, when he needs to show 3, there is supposedly some screen that I can go to, rename 2 to 3, click "Save" or something else, and it is fixed. But if he says $ 10 times 2 people = $ 30, where can I fix it? How?

You can say that the record is updated only in one place, so there are no problems. But it is today. What if tomorrow or some other programmer adds a new function to perform another type of update?

Now I am working on a system filled with redundant data. Basic information about each of our company products is stored in the "item" table. For each unit in stock, we have a stock record, and instead of just referring to an item record, they copy all the data for each stock unit. When an item is sold, we copy all the data into the sale. If something returns, we copy all the data into the return record. Etc etc. For several other types of entries. This causes endless problems. Once we had a problem when a user ran a query in which they searched for elements with certain characteristics, and the list of hits included elements that did not meet the search criteria. What for? Because the query finds all item records that match the search criteria, which tries to match these item records with stocks by part number ... but some of the stock records did not match the position record for other criteria for various reasons. I am currently working on a problem where expense data is not always copied from accounts to sell records correctly. I would just like to redesign the database to eliminate all redundant data, but it would be a huge project.

Of course, there are times when the penalty for recounting some part of the data is too high. For example, if you need to read thousands of transaction records to calculate the current balance, and you regularly want to display the current balance, this may be too much of a performance burden, and you better keep it redundant. But I would do it very slowly. Make sure this is a really serious performance issue.

Multiplying two numbers together that are in a record that you are already reading? Never. I can’t imagine that this will cause performance problems. If you are a database engine, you cannot multiply two numbers in a tiny percentage of the time it takes to read a record, get a new database engine.

0

Jay Feb 02 '10 at 17:48

source share

Robert Greiner · Accepted Answer · 2010-02-02T17:08:00+0000

As a rule, I do not save values that can be calculated (especially those that can be easily calculated) on the fly, if there is no performance problem, and I need to save some processing time.

This is the classic tradeoff between performance and storage. I would recommend calculating the value until you need a performance boost.

Simple question table design

More articles: