Should NULLS be handled in code or in a database? Advantages and disadvantages?

Question

Should NULLS be handled in code or in a database? Advantages and disadvantages?

I have a few questions regarding where to handle zeros. Let me create a script. Imagine I have a table with 5 varchar (50) columns that can be used as an example when the reasons for using zeros or empty rows are given.

Is it better to handle NULLS in code or in a database? By this I mean, is it better to assign an empty string to varchar (50) if it does not contain a value, or is it better to assign null to varchar (50) and handle this zero in the code?
Does assigning an empty row to a column affect performance overhead?
How does using a null or empty string affect indexing?
My impression is that if you do not allow your database to contain zeros, you do not need to process it in code. Is this statement true?
Do data types other than varchars pose the same problem when using the default value or are more problematic with string data types?
What is the overhead of using the ISNULL function if the table contains zeros?
What are the other advantages / disadvantages?

+6

c # sql-server tsql database-design

Xaisoft Nov 24 '09 at 20:18

source share

9 answers

The main advantage is that you can process empty and empty strings separately in both .NET and SQL-code - they can, in the end, mean different things.

The disadvantage is that you need to be careful; in .NET you do not need to call obj.SomeMethod () at zero, and in SQL you need to ensure that null values propagate when concatenated (as opposed to, for example, concatenating C # strings).

In fact, there is no noticeable difference in size between empty and empty. In .NET code, I hope it uses an interned empty string, but that will not make much difference.

+2

Marc gravell Nov 24 '09 at 20:25

source share

NULL is stored more efficiently (NULL bitmap), then an empty string (2 bytes for varchar length or "n" for char)

Blog engine log: why is a NULL bitmap in a post optimized?

I saw several articles that say different, but for char / varchar, I found that NULL would be useful and would usually treat an empty string just like NULL. I also found that NULL is faster in queries than an empty string. YMMV, of course, and I will evaluate each case according to its own merits.

+2

gbn Nov 24 '09 at 20:37

source share

You mix the problem with the implementation of the logical data architecture problem.

You must decide whether or not to allow zeros in the field solely on the basis of whether it accurately models the data that you expect to store in the database. Part of the confusion, as some others have noted, is that null and empty lines are not only two ways of storing the same information.

Null means either no value or no value.
An empty string means there is a value, and this is an empty string.

Let me show you an example. Say, for example, you have a middle name field and you need to distinguish between situations where the middle name was not completed and when the person does not have a middle name. Use an empty string to indicate that there is no middle name and null to indicate that it was not entered.

In almost all cases where zero makes sense in terms of data, they should be processed in the application code and not in the database under the assumption that the database must be distinguished between two different states.

Short version: Do not select an empty line with an empty line based on performance and storage problems in the database, select the one that best models the information you are trying to save.

+2

Johnfx Nov 24 '09 at 21:02

source share

I think that a null value and an empty string are two different things in both the code and the database. A variable or field that is null means that it does not matter, but if it is an empty string, it has a value that is an empty string.

0

Ian schmits Nov 24 '09 at 20:25

source share

1: Very subjective, as noted in other answers, there is a tangible difference between NULL (no answer / unknown) and "" (which, as you know, is nothing / not applicable, i.e. a person without an average name).

2: Should not be done.

3: AFAIK (I'm still a junior / learning database administrator, so take this with salt), but there shouldn't be an effect.

4: This is debatable. Theoretically, if you apply a NOT NULL constraint to a database field, you will never have to process a NULL value. In practice, the gap between theory and practice in theory is less than in practice. (In other words, you probably still have to handle the NULL job, even if it is theoretically impossible.)

0

Simon ighighs Nov 24 '09 at 20:45

source share

I usually use default during NOT NULL design, unless the reason is otherwise - in particular, columns with money / decimal places in accounting - usually There is no unknown aspect. It may be the case that the money column is optional (for example, a survey or business relationship system in which you place your income from a household / business - this may not be known until the link is formed by the account manager). For datetime, I would never allow a NULL RecordCreated column, for example, while a BirthDate column would allow NULL .

Columns

NOT NULL removes a lot of potential additional code and ensures that users do not have to consider NULL special processing - especially good in presentation level representations or reporting dictionaries.

I think that during development it is important to devote a lot of time to processing data types (char vs. varchar, vs nchar vs. nvarchar, money vs. decimal, int vs. varchar, GUID versus identity), NULL / NOT NULL, primary key, choice clustered index and nonclustered indexes and INCLUDE columns. I know that this probably sounds like everything in the design of the database, but if the answers to all these questions are understood from the front, you will have a much more understandable conceptual model.

Note that even in a database where there are no NULL columns, a LEFT JOIN in the view may result in NULL

For a specific case of the decision-making process, take a simple example Address1, Address2, Address3, etc. all varchar (50) is a fairly common scenario (which can be better represented as a single TEXT column, but let it be assumed that it modeled this path). I would not allow NULL, and I would default to an empty string. The reason for this is:

1) This is not entirely unknown - it is empty. The nature of UNKNOWN between multiple columns will never be clearly defined. It is very unlikely that you will have KNOWN Address1 and UNKNOWN Address2 - you either know the whole address or not. Unless you have restrictions, let them be empty and not allow NULL.

2) As soon as people begin to naively do such things as Address1 + @CRLF + Address2 - NULL begin with NULL to display the entire address! If you are not going to wrap them in a view using ISNULL or change the ANSI NULL settings, why not let them be empty - after all, this is how they are viewed by users.

I would probably use the same logic for the middle or middle primary, depending on how it is used - is there a difference between someone without a middle name or someone where he is unknown?

In some cases, I probably would not have allowed an empty string - and I would have done it with a constraint. For example, the name and surname of the patient, the name of the company on the client. They should never be empty or empty (or all spaces or the like). The more of these restrictions exist, the better the quality of the data, and the sooner you will understand stupid errors, such as import problems, NULL propagation, etc.

0

Cade roux Nov 24 '09 at 21:30

source share

Putting fake data (an empty string for string data, 0 for numbers, some ridiculous date for dates) instead of zero in the database is almost always a bad choice. These fake values do not mean the same thing, and especially for numeric data, it is difficult to obtain a fake value that does not match the actual value. And when you enter bad data, you still have to write code around it to make sure everything is being processed correctly (for example, not returning records that do not have an end date), so you actually save nothing on the development side.

If you cannot know the data at the time of entry, then this is the best choice. However, if data is known, use null if possible.

0

Hlgem Nov 24 '09 at 21:37

source share

You should look at the sixth normal form. 6NF was specially invented to get rid of the problems associated with the use of NULLS. Many of these problems are compounded by the three-valued SQL logic (true, false, unknown) and the general use of the two-valued logic by the programmer.

In 6NF, every time a row / column intersection should be marked NULL, the situation can be resolved simply by omitting the row.

However, I'm not trying to use 6NF in database design at all. In most cases, NULLable columns are not used as part of search criteria or join criteria, and problems with NULLS are not propagated.

0

Walter mitty Nov 25 '09 at 9:05

source share

Donnie · Accepted Answer · 2009-11-24T20:29:10+0000

My general advice is to declare the fields in the database as NOT NULL unless you have a special need to allow null values, since they are usually very difficult for people who are not processed by the database.

Note that the empty line and the zero line field do not necessarily mean the same thing (unless you define them). Often null means “unknown” or “not provided”, while an empty string is just a provided and known empty string.

Enabling or disabling null fields is entirely up to your needs.

Should NULLS be handled in code or in a database? Advantages and disadvantages?

More articles: