Is overuse of null columns in the database a “code smell”?

Question

Is overuse of null columns in the database a “code smell”?

I am just entering a project and has a rather large database backend. I started digging into this database, and 95% of the fields are NULL.

Is this common practice in the database world? I'm just a modest programmer, not a database administrator, but I would have thought that you would want to keep fields with a zero value to a minimum, only where they make sense.

Is this a “code smell” if most columns are NULL?

+17

sql

AlexanderTheGreat Jun 23 '09 at 20:19

source share

16 answers

hythlodayr · Answer 1 · 2009-06-23 20:26

Default values are generally an exception, and NULLs are the norm in my experience.

True, zeros are annoying.

It is also extremely useful because null is the best indicator of "NO VALUE". The specific default value is very misleading, and you can lose information or introduce confusion along the way.

Bill Karwin · Answer 2 · 2009-06-23 20:29

Anyone who has developed a data entry application knows how common this is for some fields to be unknown during input - even for columns that are business critical, to answer the @Chris McCall question.

However, the "smell of code" is simply an indicator that something can be encoded in a casual way. You use odors to identify things that require more research, rather than things that need to be changed.

So, if you see nullable columns so consistently, you are right to be suspicious. This may mean that someone was lazy or was afraid to declare NOT NULL columns unambiguously. You can justify your own analysis.

Joel Goodwin · Answer 3 · 2009-06-23 21:50

I am from Extreme NO camp: I constantly avoid NULL. Putting aside the fundamental considerations as to what they really mean (because talking to different people, you will get different answers, such as “no value”, “unknown value”, “missing”, “my ginger cat” called "Zero"), the worst NULL problem is that they often destroy your requests in a mysterious way.

I lost counting the number of times I had to debug a request (well, maybe 9), and traced the problem to connecting to NULL. If your code needs ISNULL to reconnect, then there is a chance that you also lost index applicability and performance.

If you do have to store the value "missing / unknown / null / cat" (and this is what I prefer to avoid), it is better to be explicit.

Specialists at NULL may disagree. Using NULL tends to split the SQL crowds in the middle.

In my experience, heavy use of NULL was positively correlated with database abuse, but I would not cut it into stone tablets like some Law of Nature. My experience is just my experience.

EDIT: An additional thought. It is possible that those who are neutral racists like me are more excited about normalization than those who are pro-NULL. I don’t think that crazy normalizers will be too happy with the torn edges on their tables that can accept NULL. A lot of null values may indicate that database developers do not have serious normalization. Therefore, instead of suggesting NULL code "badly", it can alternatively offer the philosophical position of developers to normalize. Perhaps it comes. Just a thought.

Todd R · Answer 4 · 2009-06-23 20:27

I don’t know if I think this is always bad, but if columns are added because one record (or maybe several) should have values, and most should not, then this indicates a rather flat table composition. If you see column names like "addr1", "addr2", "addr3" then it stinks!

I bet that most of the columns that you have can be deleted and presented in other tables. You can find "nonzero" using a foreign key relationship. This will increase the number of connections you will make, but there may be more preformant that "where no col1 is null".

Gregory Higley · Answer 5 · 2009-06-23 20:48

It seems to me to avoid columns with a null value. Where domain semantics allow the use of a value that explicitly indicates missing data, it should be used instead of NULL.

For example, imagine a table containing a Comment field. Most developers posted NULL here to indicate that there is no data in the column. (And, hopefully, a control restriction that prohibits zero-length strings so that we have a known "value" indicating no value.) My approach is usually the opposite. The Comment column is NOT NULL , and a row of length zero indicates no value. (I use a check constraint to ensure that a zero-length string is indeed a zero-length string, not a space.)

So why should I do this? Two reasons:

NULL requires special logic in SQL, and this method avoids this.
Many client libraries have special values that indicate NULL . For example, if you use Microsoft ADO.NET, the constant DBNull.Value indicates NULL, and you should check it. Using a row of zero length in a NOT NULL column eliminates the need.

Despite all this, there are many circumstances in which NULL are accurate. In fact, I do not mind using them in the above scenario, although this will not be my preferred method.

Whatever you do, be kind to those who will use your tables. Be consistent . Allow them with confidence SELECT . Let me explain what I mean. I recently worked on a project whose database was not developed by me. Almost every column was nullable and had no limits. There was no agreement on what constitutes a lack of value. It could be NULL , a string with zero length, or even a bunch of spaces, and often was. (How this soup of values got there, I don't know.)

Imagine the ugly code that a developer must write to find all of these entries with a missing Comment field in this scenario:

 SELECT * FROM Foo WHERE LEN(ISNULL(Comment, '')) = 0

Surprisingly, there are developers who consider this to be quite acceptable, even normal, despite the potential performance implications. Better would be:

 SELECT * FROM Foo WHERE Comment IS NULL

or

 SELECT * FROM Foo WHERE Comment = ''

If your table is designed correctly, you can use the above two SQL statements to get high-quality data.

Cade Roux · Answer 6 · 2009-06-23 22:32

In short, I would say yes, this is probably the smell of code.

Regardless of whether the column is valid or not, this is very important and must be carefully defined. The question should be evaluated for each column. I do not believe in one default "best practice" for NULL . The “best practice” for me is to completely eliminate the ambiguity during the design and / or refactoring of the table.

For starters, none of your primary key columns will be NULL. Then I strongly lean toward NOT NULL for anything that is a foreign key.

Some other things that I consider:

Criteria in which NULL should be avoided: money - is there really a chance that this amount will be unknown?

Criteria in which NULL can be justified most often: datetime - there are no reserved dates, so NULL is your best option

Other data types: char / varchar columns - for codes / identifiers - NOT NULL almost exclusively int - basically NOT NULL if this is not like the "number of children" where you want to distinguish an unknown answer.

jmucchiello · Answer 7 · 2009-06-23 20:28

No, whether the field should be null or not is a data concept and cannot be the smell of code. Regardless of whether NULL is annoying to the code, it has nothing to do with the usefulness of having NULL fields.

Bruno Martinez · Answer 8 · 2009-06-23 20:28

I am afraid that it is a (very common) smell. See CJ Writing Dates on this topic.

Cody C · Answer 9 · 2009-06-23 20:22

As a best practice, if a column should not be null, then it should be marked as such. However, I do not believe that I am losing my mind.

Chris McCall · Answer 10 · 2009-06-23 20:24

I think so. If you do not need data, it is not important for your business. If it is important for your company, it is necessary.

Matthew Vines · Answer 11 · 2009-06-23 20:27

All this completely depends on the volume and requirements of the project. I would not use the number of fields with a zero value as an indicator for poorly written or developed code. Look at the business domain, if there are many unimaginable fields in the database that can be null in the database, then you have some problems.

MatthewMartin · Answer 12 · 2009-06-23 20:50

In my experience, this is a problem where Null and Not Null do not match the required field / optional field.

In reality, the probability that these are indeed all optional fields. If you find in the business or user interface level that these fields are necessary, then I think that this means that the data model deviated from the business object model and is a sign of excessively conservative policies for changing the database or supervision.

If you run the sample data generator in your data, and then try to load data that is valid in accordance with SQL, you will immediately find out if the rules comply.

HLGEM · Answer 13 · 2009-06-23 21:12

That sounds like a lot, maybe that means you should at least investigate. Please note: if it is a mature product with a lot of data, it can be difficult to convince someone to change the structure. The sooner you catch something like this at the design stage, the easier it is to fix all the related codes in order to customize the changes.

It is bad that they used zeros, it will depend on whether the columns that allow zeros look as if they were linked tables (home phone, cell phone, work phone, etc., which should be in the telephone a table with a satellite) or if they look like things that may not apply to all records (maybe this may be a related table with a one-to-one relationship) or may not be known at the time of data entry (maybe good). I would also like to check if they really have alwAys value (then you could change the null value if the information is really required by the busniess logic). If you have multiple entries with zero

Victor · Answer 14 · 2009-06-23 21:32

In my experience, many fields with zero value in a large database, like you, are very normal. Given this, perhaps many applications are written by different people. Creating null columns is annoying, but perhaps this is the best way to keep the application reliable.

Ian Ringrose · Answer 15 · 2009-06-23 21:34

One of the many ways to map inheritance (for example, C # objects) to the database is to create a table for the class at the top of the hierarchy, and then add columns for all other classes. Columns must be zero if an object of another subclass is stored in the database. This is called Mapping inheritance over a single table (or Map Hierarchy for A Single Table ) and is a standard design pattern.

A side effect of matching inheritance over a single table is that most columns are NULL.

Also, in Oracle, an empty string (length 0) is considered empty, so in some companies all column columns become valid for NULL even on SqlServer. (just because the first client does not want the SqlServer software to mean that the 2nd client does not have an Oracle DBA that will not allow SqlServer to connect to the network)

Chris Marisic · Answer 16 · 2009-10-26 18:07

Throw the opposite opinion. Each individual field in the database must be zero. There is nothing more unpleasant than working with a database, which on each separate tab causes an exception to this or that required. Nothing required.

There is one exception to this, keys. Obviously, all primary and foreign keys must be respected.

The task of the application is to check the data and the database in order to simply store and retrieve what you give them. If it handles validation logic even as simple as null or not, then null makes the project more complex to support different rules that apply to everything.

Is overuse of null columns in the database a “code smell”?

More articles: