Pre-implementation of denormalization? If not, then what (in simple terms)?

Question

Pre-implementation of denormalization? If not, then what (in simple terms)?

I am trying to understand denormalization in databases, but almost all of the articles that Google spat on are aimed at advanced DBAs. I know a lot about MySQL and MSSQL, but I cannot understand this.

The only example I can think of when speed was a problem was that in the calculations there were about 2,500,000 rows in two tables in the place where I was interned. As you can guess, calculating what was required by request forever braked the dev server, which I was on for a few minutes. So right before I left my supervisor, I wanted to write a calculation table that would contain all the pre-calculated values and would be updated approximately every hour or so (it was an internal site that was not used often). However, I could not finish it because I left

Will this be an example of denormalization? If so, is this a good example of this or does it go much further? If not, what is easy?

+4

sql

Thelq Aug 21 '10 at 13:17

source share

5 answers

I would call this aggregation not denormalization (if this is the number of orders, for example, SUM (orders) per day ...). For this, OLAP is used. Denormalization would be, for example, instead of the PhoneType and PhoneTypeID tables in the Contact table, you just have to have PhoneType in the Contact table, thereby eliminating 1 connection

Of course, you can also use indexed / materialized views to aggregate values ... but now you will slow down your update, delete and insert

triggers are also another way of doing this

+1

SQLMenace Aug 21 '10 at 13:23

source share

In an overly simplified form, I would describe de-normalization as reducing the number of tables used to represent the same data.

Clients and addresses are often stored in different tables so that the concept of a single client has multiple addresses. (Job, home, current address, previous address, etc.)

The same can be said of applying to surnames and other properties, but only the current surname is ever worrying. Thus, you can normalize the entire path to the Customer table and Surname table, with foreign key relationships, etc. But then denormalize this by joining the two tables together.

The advantage of “normalizing until it hurts” is that it forces you to consider a clean and (hopefully) complete presentation of data and possible behaviors and relationships.

The advantage of “de-normalizing until it works” is the reduction of some maintenance and / or processing costs, but it adheres to the same basic model as when developing a normalized model.

In the Surname example, by denormalizing, you can add an index to clients based on their last name and date of birth. Without de-normalization, surnames and DoB are in different tables, and a composite index is not possible.

+1

MatBailie Aug 21 '10 at 15:08

source share

Denormalization can be useful, the example you provided is an example of this. It is impractical to dynamically calculate them, since the cost is expensive and, thus, you create a table and have a functional identifier that refers to another table, as well as the value of the calculation.

The data is redundant because it can be obtained from another table, but due to production requirements, this is the best design in a functional sense.

It is interesting to see what others have to say on this subject, because I know that my sql professor will shrink under the term denormalize, but it has practicality.

0

Chris Aug 21 '10 at 13:21

source share

The normal form will reject this table because it is completely inferred from existing data. However, for performance reasons, this type of data is usually detected. For example, inventory counting is usually carried forward, but inferred from the transactions that created them.

For smaller, faster dials, you can use a view to create a population. This provides the user with the data they need (aggregated value), and does not force them to aggregate them. Oracle (and others?) Brought in materialized views to do what your manager suggested. It can be updated in different schedules.

If update volumes are allowed, triggers can be used to emulate a materialized view using a table. This can reduce the cost of maintaining aggregate value. If it did not spread the overhead for a longer period of time. However, it adds the risk of creating a deadlock condition.

OLAP makes this simple case more interesting for aggregates. Analysts are interested in aggregated values, not details. However, if the aggregated value is interesting, they can look at the details. Starting from normal shape is still good practice.

0

Billtor Aug 21 '10 at 14:14

source share

Beth · Accepted Answer · 2010-08-21T14:27:23+0000

Suppose you have an Excel file with two sheets that you want to use to store family contact information. On the first sheet, you have the names of your contacts with their cell phone numbers. On the second sheet, you have mailing addresses for each family with landline phone numbers.

Now you want to print Christmas card labels to all your family contacts, which list all the names, but only one label per mailing address.

You need a way to link two normalized sets. All data in the 2 sets that you have is normalized. It is “atomic,” representing one “atom,” or a piece of information that cannot be broken down. None of this is repeated.

In the denormalized representation of 2 sets, you will have one list of all contacts with mail addresses that are repeated several times (cousin Alan lives with Uncle Bob at the same address, so he is listed both on Alan's lines and on Bob's.)

At this point, you want to enter the household ID in both sets to associate them. Each mailing address has one household identifier, each contact has a household value that can be repeated (cousin Alan and uncle Bob, who live in the same house, have the same home address).

Now say that we work and we need to track the number of contacts and households. Saving normalized data is great for maintenance purposes, because we want to store only contact data and home data in one place. When we update the address, we update it for all related contacts. Unfortunately, for performance reasons, when we ask the server to join two related sets, it takes forever.

Thus, a certain developer appears and creates one denormalized table with all numbers consisting of rows, one for each contact with household details. Performance improves, and space considerations are thrown right out of the window, since now we need space for 3 million rows instead of two.

Make sense?

Pre-implementation of denormalization? If not, then what (in simple terms)?

More articles: