Using a natural key or using surrogate keys and audit tables for an audit / change log

My first question here will be enjoyable!

I am a junior developer with little experience and I have problems with this problem.

I have a table that needs to be checked. Say, in this table, the phone calls made by the call center are recorded (this is not so, but this is just an example). I will call him "CallHistory".

I originally planned to keep a separate table called "Callees", which has the name of the called party, phone number, etc. This table will use the surrogate primary key.

The CallHistory table will have a foreign key in the Callee table.

I initially did this so that if I changed the phone number of the called party, it would be distributed throughout the system, and I would not have to change the phone number in several tables.

The problem is that the entire table of the CallHistory table is designed to record the HISTORY of calls, including incorrectly dialed calls (for example, the caller dialed the wrong number). History will be lost using this surrogate key approach.

One of the senior developers at work suggested saving copies of the phone number for each set of the caller at this particular time in the CallHistory table in order to save the history.

I was thinking about saving the audit / change log table for the same purpose.

Will my approach be sufficient for this purpose or am I completely behind? Which approach do you prefer?

Cheers, Andrew

+4
source share
3 answers

I agree with Rick. Yes, redundant data is very, very bad, evil, smelly and otherwise undesirable. But just because the two fields are called the "phone number", they do not make them the same. “Current customer phone number” and “Customer phone number at the time we last talked to him” do not necessarily match.

I am currently working with a database that stores information about sales and details. A position record includes information such as description, stock number, and price. Our sales records also include a description, stock count and price. The description and stock number are redundant and must be eliminated. It was a bad design. But the price should be included in both places. There is a big difference between the current price and the price at the time of this sale. This sale could have been many years ago. Since then, the price may change a dozen times.

As a rule, in the application, as you describe, I just put the phone number in the history table and do with it. There is not much that can be obtained by specifying the “Telephone Number History” table and contacting the telephone number entry in the corresponding number. It can save a few bytes per write, but that will add complexity. However, if there are several related fields, the story changes. If, say, I just come up with an example here to give an idea - you are an insurance company, and your coverage conditions vary depending on location due to different state laws available to doctors in the area, etc. so when the client moves its policy, it needs to be rewritten, now the phone number can be associated with many other data elements, and therefore everything should go in one table and you refer to the corresponding record. Otherwise, you may have a phone number in New Jersey, but you are contacting the political conditions of California, etc.

+1
source

I think you are being fooled by the subtlety regarding normal forms here. The fact is that the phone number associated with the called subscriber is not the same piece of information as the number dialed by the calling subscriber. They may have the same value in the general case, but this is another problem.

So, in my opinion, CallHistory should have both a numbre dialed and a link to the called table.

+2
source

Your question is a very typical design dilemma. For example, if you have a database in normal form, and you have the following tables: sales, managers (who sells) and regions (where managers work). You create reports such as "Annual sales grouped by region", where you join sales with managers and managers with regions. But if one of the managers moves to another office within a year, it seems that your report will show incorrect data, right?

What are 3 solutions?

1) In some cases, the developers and the analyst decide: well, our data is not very correct, but now everything is in order, we want to stay in a normal form and not duplicate the data. This decision is less complicated. In this case, you can create the Callers and CallHistory tables in the usual form, i.e. The phone number will only be in the Callers table.

2) There is a requirement not to lose any historical changes. And we want our reports and queries to be very fast (due to the size of the database). In this case, people decide to duplicate all the fields. For example, you can create a CallHistory table that has a phone number, caller name, address, etc., since you expect each of these fields to be changed in the future. Of course, you can also create a Callee table (you may need it for other purposes), but it may be called by CallHistory, or it may not. Suppose you think that some entries need to be deleted from Callee, but you want them to be in CallHistory. This is the case when developers often think that they can violate the referential integrity of the data, do not create any foreign keys from the CallHistory table. And this is reasonable, because without foreign keys, inserts will work faster.

3) Approach I prefer it, but it is the most difficult from the point of view of implementation: the CallHistory table should refer to the CalleeHistory table. The CalleeHistory table will have all the records that are in the Callee table, but it also has a surrogate key, for example CalleeID + DateModified (sometimes ModificationVersionNumber is used instead of DateModified developers). In CallHistory, we have a surrogate foreign key that references CalleeID + DateModified. In this case, you have normalized data (i.e. the phone number is not duplicated in different tables), and also you have not lost any historical changes.

As I said, implementation complexity, database performance, database size, data integrity, and system functional requirements often arise. If you are a junior developer, it’s nice to keep in mind all the possible solutions, but you should probably listen to a senior developer who knows more about your system and requirements than anyone else from Stack Overflow.

ps

If you want to learn about other approaches, read about slowly changing dimensions, for example http://en.wikipedia.org/wiki/Slowly_changing_dimension

+1
source

All Articles