Database Design: Performing String Count Results

Several times I came across the following situation and wondered what the best practices say about this situation:

Rows are inserted into the table as users perform some actions. For example, each time a user visits a certain part of a website, a line is inserted with their IP address, username and link URL. Elsewhere, I want to show a summary of these actions. In our example, I want to allow administrators to enter the website and see how many visits are for a particular user.

The most natural way to do this (IMO) is to insert a row for each visit, and each time the administrator asks for the totals, count the number of rows in the corresponding table for this user. However, in such situations, there may be thousands and thousands of lines per user. If administrators often request totals, a constant request for counters can put a strain on the database. Therefore, it seems to be the right decision to insert separate lines, but at the same time store some summary data with current totals as you enter data (to avoid recounting these totals again and again).

What is the best practice or most common database schema schema for this situation? You can ignore the specific example that I have compiled, my real question is how to handle such cases, as it relates to large volumes of data and frequently requested totals or calculations of this data.

+4
source share
1 answer

Here are a few practices; the one you choose will depend on your specific situation:

  • Trust your database engine . Many database engines will automatically cache query plans (and results) of frequently used queries. Even if the baseline data has changed, the query plan itself will remain the same. The corresponding parts of the indexes will be stored in the main memory, which will lead to the fact that this query will be almost free. The most that you may need in this case is to configure the database settings.

  • Denormalize your database . While the 3rd Normal Form ( 3NF ) is still considered an appropriate database for performance reasons, it may be necessary to add additional tables that include totals, which are usually calculated as needed using the SELECT ... GROUP BY ... query SELECT ... GROUP BY ... Often, these other tables are constantly updated using triggers, stored procedures, or background processes. See Wikipedia for more information on Denormalization .

  • Data warehouse is a data warehouse , the goal is to copy real-time copies of data to secondary databases (stores) for queries and special reports. This is usually done using background processes using any replication methods supported by your database. These stores are often indexed more strictly than may be necessary for your base application, with the intention of supporting large queries with a huge amount of historical data.

+3
source

Source: https://habr.com/ru/post/1314103/


All Articles