What is the best database design: more tables or more columns?

A former colleague insisted that a database with more tables with fewer columns is each better than one with fewer tables with more columns. For example, instead of a customer table with names, addresses, cities, states, postal codes, etc., you will have a name table, address table, city table, etc.

He claimed that this design was more efficient and flexible. It may be more flexible, but I can not comment on its effectiveness. Even if it is more efficient, I think that these wins can be outweighed with added complexity.

So, are there significant advantages for more tables with fewer columns for fewer tables with more columns?

+57
database database-design normalizing
Sep 12 '08 at 16:45
source share
18 answers

I have some pretty simple rules that I adhere to when designing databases that I think can be used to make such decisions ....

  • Perform normalization. Denormalization is a form of optimization with all the necessary compromises, and therefore it should be approached with an attitude
+51
Sep 12 '08 at 17:02
source share

I would argue in favor of more tables, but only to a certain point. Using your example, if you split your user information into two tables, say USERS and ADDRESS, this gives you the flexibility of using multiple addresses for each user. One obvious use of this is for a user who has separate billing and delivery addresses.

The argument for having a separate CITY table is that you only need to save each city name once, and then refer to it when you need it. This reduces duplication, but in this example I think this is unnecessary. It may be more space, but you will pay the price of the connection when choosing data from your database.

+11
Sep 12 '08 at 16:50
source share

This does not sound like the question about tables / columns, but about normalization. In some situations, a high degree of normalization (in this case “more tables”) is good and clean, but usually it takes a large number of CONNECTIONS to get the appropriate results. And with a sufficiently large dataset, this can lead to poor performance.

Jeff wrote a little about this regarding the design of StackOverflow. See also Jeff's post on Dare Obasanjo .

+10
Sep 12 '08 at 16:48
source share

It depends on your taste of the database. For example, MS SQL Server prefers narrower tables. This is also a more “normalized” approach. Other engines may prefer this the other way around. Mainframes tend to fall into this category.

+5
Sep 12 '08 at 16:47
source share

A fully normalized design (i.e., “More Tables”) is more flexible, easy to maintain, and avoids data duplication, which means that your data integrity will be much easier to enforce.

These are powerful reasons for normalizing. First, I would like to normalize, and then only denormalize certain tables after you see that performance is becoming a problem.

My experience is that in the real world you will not reach the point where denormalization is necessary, even with very large data sets.

+5
Sep 12 '08 at 16:54
source share

Each table should contain only columns related to the object that is uniquely identified by the primary key. If all the columns in the database are attributes of the same object, you only need one table with all the columns.

If any of the columns may be null, you will need to put each column with a null value in its own foreign key table in the main table in order to normalize it. This is a common scenario, so for a cleaner design, you are probably adding more tables than columns to existing tables. In addition, by adding these optional attributes to your own table, they will no longer need to resolve null values, and you avoid a lot of NULL issues.

+4
Sep 12 '08 at 17:00
source share

A database with multiple tables is much more flexible if any of these one-to-one relationships can become one to many or many in the future. For example, if you need to store multiple addresses for some clients, it is much easier if you have a client table and an address table. I can’t see a situation where you may need to duplicate some parts of the address, but not the other, so separate tables of addresses, cities, states and postal codes can be slightly higher.

+3
Sep 12 '08 at 16:50
source share

Like everything else: it depends.

There is no hard and fast rule regarding the number of columns and counting tables.

If your customers need to have several addresses, then a separate table makes sense for them. If you have a really good reason to normalize the City column to your own table, then this too can go, but I have not seen this before, because it is a free-form field (usually).

A tabular heavy, normalized design is space-efficient and looks like a “good-to-read” textbook, but can become extremely complex. This looks good until you have to make 12 joins to get the name and address of the customer. These projects are not automatically fantastic in terms of the productivity that matters most: queries.

Avoid difficulties if possible. For example, if a client can have only two addresses (not arbitrarily many), then it makes sense to just keep them all in one table (CustomerID, Name, ShipToAddress, BillingAddress, ShipToCity, BillingCity, etc.).

Here's a Jeff post on the topic.

+3
Sep 12 '08 at 16:53
source share

There are advantages to having tables with fewer columns, but you also need to look at your script above and answer the following questions:

Will the client be allowed to have more than 1 address? If not, a separate table for the address is not required. If so, then a separate table becomes useful because you can easily add more addresses as needed along the road, where it becomes more difficult to add more columns to the table.

+2
Sep 12 '08 at 16:48
source share

I would consider normalization as a first step, so cities, counties, states, countries would be better as separate columns ... the strength of the SQL language, together with today's DBMSs, allows you to group your data later if you need to view it in a different, unnormalized way.

When a system is being developed, you may think of “abnormalizing” any part if you see this as an improvement.

+1
Sep 12 '08 at 16:48
source share

I think the balance is in this case. If it makes sense to put the column in the table, then put it in the table, if it is not, then no. The approach of your colleagues will definitely help normalize the database, but it may not be very useful if you need to combine 50 tables to get the information you need.

I assume that my answer will be, use your best judgment.

+1
Sep 12 '08 at 16:48
source share

There are many sides to this, but in terms of application efficiency, mote tables can be more efficient at times. If you have several tables with a bunch of columns every time db has a chance to make a lock in order to perform an operation, more data will be unavailable for the duration of the lock. If locks go to the page and tables (well, hopefully not tables :)), you can see how this can slow down the system.

+1
Sep 12 '08 at 16:49
source share

Hm.

I think this is a wash and depends on your specific design model. Definitely exclude objects that have more than a few fields in their own table or entities whose makeup is likely to change as the requirements for your application change (for example, I would still decline the address, because it has so many fields, but I especially if you think that you will need to process the addresses of other countries that may have a different form. The same thing with phone numbers).

However, when you work, keep an eye on the work. If you have deployed an organization that requires you to have large and costly associations, this may be the best solution for designing this table back to the original.

+1
Sep 12 '08 at 17:46
source share

Queries have huge benefits using as few columns as possible. But the table itself can have a large number. Jeff also talks about this.

Basically, make sure that you do not request more than you need when you execute the request. Query performance is directly related to the number of columns you query.

0
Sep 12 '08 at 16:49
source share

I think you need to look at the data that you store before making such a decision. Having an address table is great, but only if the likelihood that several people will have the same address is high. If each person has different addresses, storing this data in another table simply introduces unnecessary joins.

I do not see the benefits of having a city table, if the cities themselves are not objects that interest you in your application. Or, if you want to limit the number of cities available to your users.

The bottom line is a solution like this to take a look at the application yourself before you start shooting for better performance. IMO.

0
Sep 12 '08 at 16:50
source share

When you create your database, you should be as close as possible to the data value, and not to your application!

A good database design should last more than 20 years unchanged.

A client can have several addresses, this is a reality. If you decide that your application is limited to one address for the first release, this is about the design of your application, not the data!

It is better to have multiple tables instead of multiple columns and use a view if you want to simplify your query.

In most cases, you will have a performance problem with a database on network performance (a chained query with a single row result, a null column that you don't need, etc.), and not about the complexity of your query.

0
Jul 29 '14 at 7:46
source share

Normalize your tables first. This avoids redundant data by providing you fewer rows of data to scan, which improves your queries. Then, if you come across a point at which you are connecting to normalized tables, make the query process for a long time (expensive join clause), denormalize where necessary.

0
Aug 11 '14 at
source share

It's nice to see so many inspiring and well-founded answers.

My answer would be (unfortunately): it depends.

Two cases: * If you are creating a datamodel that should be used for many years and, therefore, you may have to use many subsequent changes: go to more tables and fewer rows and fairly strict normalization. * In other cases, you can choose between fewer tables or fewer tables - more rows. Especially for people relatively new to the subject, this latter approach may be more intuitive and easier to understand.

The same is true for choosing between an object-oriented approach and other parameters.

0
Nov 21 '16 at 12:11
source share



All Articles