Should the results be denormalized?

I work on a website with a simple normalized database.

There is a table called “Pages” and a table “Views”. Each time a page is viewed, a unique record of this view is written to the view table.

When displaying a page on a site, I use simple MySQL COUNT () to sum the number of views to display.

The database design seems beautiful, except for this problem: I don’t understand how to get the 10 most viewed pages among thousands.

Should I denormalize the Pages table by adding the Pages.views column to store the total number of views for each page? Or is there an efficient query method for the top 10 most viewed pages?

+4
source share
5 answers
SELECT p.pageid, count(*) as viewcount FROM pages p inner join views v on p.pageid = v.pageid group by p.pageid order by count(*) desc LIMIT 10 OFFSET 0; 

I can’t verify this, but something like that. I would not have kept the value unless I had to because of performance limitations (I just found out the term "premature optimization" and it seems to be applicable if you do this).

+8
source

It depends on the level of information that you are trying to save. If you want to record who watched when? Then a separate table is in order. Otherwise, the Views column is the way to go. In addition, if you save a separate column, you will find that the table will be locked more often, as each page view will try to update the column for the corresponding row.

 Select pageid, Count(*) as countCol from Views group by pageid order by countCol DESC LIMIT 10 OFFSET 0; 
+3
source

I would probably include a view column in the Pages table.

This seems to be a perfectly reasonable normalization violation for me. Moreover, I can’t imagine that you remove the views so that you do not expect the count to be removed from the blow. In this case, referential integrity does not seem supercritical.

+1
source

Database normalization is the most efficient / least redundant way to store data. This is good for transaction processing, but often directly conflicts with the need to efficiently receive data again. Usually the problem is solved by creating tables (indexes, materialized views, drive tables ...) with more accessible, pre-processed data. The (slightly dated) buzzword here is Data Warehouse.

I think you want your page table to be normalized, but you have an extra table with totals. Depending on how recent these calculations should be, you can update the table when updating the original table, or you may have a background setting for periodically recalculating the totals.

You also want to do this only if you are really faced with a performance problem that you will not, if you do not have a very large number of records or a very large number of concurrent accesses. Keep your code flexible so you can switch between a table and not have one.

+1
source

Denormalization will certainly work in this case. Your loss is an extra storage room spent by an extra column.

Alternatively, you can set up a scheduled task to populate this information at night, when your traffic will be low, x period of time.

In this case, you will lose the ability to instantly find out the number of pages if you do not complete this request manually.

Denormalization can be used to increase productivity.

- Kris

0
source

All Articles