Normalize or denormalize: store contact details (phone numbers) in a separate table? Searching results?

I am developing a database application that stores simple contact information (first / last name, etc.), and I also need to store phone numbers. In addition to phone numbers, I have to store what they are intended for (mobile, business, etc.) And, possibly, an additional comment for everyone.

My first approach was to normalize and save the phone numbers in a separate table so that I had my “Contacts” and my table “PhoneNumbers”. The PhoneNumbers table will look like this:

Id int PK ContactId int FK<->Contacts.Id PhoneNumber nvarchar(22) Description nvarchar(100) 

However, this will simplify the work and save the SQL search query if I just save this information as part of each contact record (provided that I limit the total number of phone numbers that can be saved, say 4 numbers).

However, in the end I get an ugly structure:

 PhoneNumber1 nvarchar(22) Description1 nvarchar(100) PhoneNumber2 nvarchar(22) Description2 nvarchar(100) 

etc .. and others.

It looks amateurish to me, but here are the benefits that I see:

1) In ASP.NET MVC, I can just attach the input text fields to my LINQ object properties, and I ended up adding and updating entries.

2) SQL query is not required to obtain information.

Unfortunately, I am not very well versed in issues such as table width problems (I read that this can cause problems if it is too large or too many columns and that performance problems arise?), And then that will mean that when I search for a phone number I will need to look for 4 fields instead of 1 if I saved it in a separate table.

In my application, about 80% of data searches / searches, so search efficiency is an important factor.

I appreciate your help finding the right path for this. Split a table or save all in one? Thanks!

+4
source share
6 answers

This is unlikely to cause problems with data denormalization, but I would not suggest it. Even if it may be more difficult to query, it is better to have well-formed data that you can manipulate in many ways. I would suggest a database schema as follows:

 Contacts: ID (Primary Key) Name Job Title Phone Number Categories: ID (Primary key) Name Phone Numbers: ID (Primary Key) Category_ID (Foreign Key -> Phone Number Categories.ID) Contact_ID (Foreign Key -> Contacts.ID) Phone Number 

This allows you a lot of flexibility in the number of allowed phone numbers and allows you to classify them.

+9
source

Now this may be fine, but what happens when someone wants a fifth phone number? Are you continuing to add more fields?

Another thing to consider is how do you run a query to say, “Give me all the people and their mobile numbers,” or “Give me all without the phone number”? With a separate table, this is easy, but with one table, a mobile phone number can be in any of the four fields, so it becomes much more complicated.

If you take a normalized approach, and in the future you would like to add additional data about the phone number, you can simply add another column to the phone number table, rather than adding 4 columns to the contact table.

Returning to the first question about adding more phone numbers in the future - if you add more numbers, you will probably have to amend every request / bit of logic / form that works with data associated with phone numbers.

+6
source

I am a supporter of a normalized approach. What if you decide you want to add the "Extension" column for phone numbers? You will need to create the columns "Extension1", "Extension2", "Extension3", etc. It can become quite tedious to maintain at some point.

And again, I don’t think you can go too wrong anyway. This is not like normalization / denormalization, it will take so long if you decide to switch to another method.

+4
source

An important principle of denormalization is that it does not sacrifice normalized data. You should always start with a scheme that accurately describes your data. Thus, you should place different types of information in different tables. You should also indicate as many restrictions on your data as you consider reasonable.

All these goals, as a rule, make queries a few seconds longer, since you need to join different tables to get the necessary information, but with the correct names for tables and columns, this should not be a burden from the point of view of readability.

More importantly, these goals can affect performance. You must monitor your actual workload to make sure your database is working properly. If almost all your queries return quickly, and you have a lot of CPU on hand for more queries, then you're done.

If you find that write requests are time consuming, make sure that you do not denormalize your data. You will make the database work harder to maintain consistency, as it will have to do many readings, followed by many more records. Instead, you want to look at your indexes. Do you have column indexes that you rarely query? Do you have indexes needed to verify the integrity of the update?

If your read requests are your bottleneck, then again you want to start by looking at your indexes. Do you need to add an index or two to avoid table scans? If you just can't avoid scanning the tables, are there any things you could do to reduce each row, for example by decreasing the number of characters in a varchar column or dividing rarely requested columns into another table to be joined when they are necessary.

If there is a certain slow query that always uses the same connection, then this query can benefit from denormalization. First, make sure that reading these tables greatly exceeds the number of records. Determine which columns you need from one table to add to another. You might want to use several different names for these columns, so that it is more obvious that they are denormalization. Change the write logic to update both the source table used in the join and the denormalized fields.

It is important to note that you are not deleting the old table. The problem with denormalized data is that, by speeding up the particular query for which it was designed, it tends to complicate other queries. In particular, write requests should do more work to ensure that the data remains consistent, either by copying data from table to table, by adding additional subqueries to make sure the data is correct, or skipping over other kinds of obstacles. By preserving the original table, you can leave all your old constraints in place, so at least those source columns are always valid. If for some reason you find that the denormalized columns are out of sync, you can go back to the original, slower query, and everything really is, and then you can work on ways to restore the denormalized data.

+3
source

I agree with @Tom that normalization makes more sense and provides flexibility. If you correctly entered your indexes, you should not suffer too much when performing a join between tables.

As for your normalized table, I would add a type or code field so that you can specify the ID: Home, Home 1, Home 2, Business, Bus 1, Bus 2, Mobile, Mob1, etc.

 Id int PK ContactId int FK<->Contacts.Id Code char(5) PhoneNumber nvarchar(22) Description nvarchar(100) 

And save this type in a separate table, specifying other information about the code / code description

We tend to have a code group table with information like

 CODE_GROUP, CODE DESC ST State PH Phone Number AD Address Type 

And the CODE table with

 CODE_ID, CODE_GROUP, DESCRIPTION MB1 PH Mobile One MB2 PH Mobile Two NSW ST New South Wales etc... 

You can expand it to have a long description, short descriptions, ordering, filtering, etc.

+1
source

What about the XML field in the contact table? This eliminates the complexity of another table.

(Please correct me if this is a bad idea, I have never worked with XML fields before)

0
source

All Articles