What is the best way to store binary flags / booleans in each database engine?

Question

What is the best way to store binary flags / booleans in each database engine?

I saw some possible approaches (in some database systems, some of them are synonyms):

TINYINT (1)
Bool
BIT (1)
ENUM (0,1)
CHAR (0) NULL

It should be noted all the basic database mechanisms supported by PHP, but just like repetition, it will be even better if other engines are noted.

I ask the design is best optimized for reading . for example, SELECT with a flag field in a WHERE or GROUP BY clause. Performance is much more important than storage space (unless size affects performance).

And a few more details:

When creating a table, I can’t know if it will be sparse (if most flags are turned on or off), but I can add ALTER tables later, so if there is something that I can optimize, if I know it, it should be noted.

Also, if it matters, if the line has only one flag (or several), as well as many (or many) flags.

By the way, I read SO somewhere as follows:

Using a boolean can do the same as using tinyint, however it has the advantage of semantic transfer, what is your intention, and it is worth something.

Well, in my case, it does not cost anything, because each table is represented by a class in my application, and everything is explicitly defined in the class and well documented.

+6

database-design flags bitflags

xun Dec 26 '10 at 22:41

source share

3 answers

I know that this is not the answer you want, but the difference is really careless in all but the most extreme special cases. And in each case, simply switching the data type will not be enough to solve the performance problem.

For example, here are a few alternatives that will exceed any data type change by a large factor. Of course, everyone carries a flaw.

If you have 200 optional flags, and you request no more than 1-2 at a time for a large number of rows, you will get better performance by having each flag in its own table. If the data is really sparse, it gets even better.

If you have 200 required flags, and you only perform single entries, you should put them in the same table.

If you have a small set of flags, you can pack them into a single column using a bitmask, which is an efficient repository, but you cannot (easily) request individual flags. Of course, this does not work when flags can be NULL ...

Or you can get creative and use the concept of "garbage size" in which you create a separate table with all 200 boolean flags presented in the form of columns. Create one row for each individual combination of flag values. Each line receives the primary auto-increment key that you specify in the master record. Voila, the main table now contains 1 int, rather than 200 columns. Hackers of heaven, a DBA nightmare.

What I'm trying to do is that although it is interesting to argue about what is “best,” there are other issues that are of much greater importance (for example, the comment you quote). Just because when you encounter a real performance problem, the data type will not be a problem or solution.

+1

Ronnis Dec 27 '10 at 0:08

source share

Any of the above is great, and I prefer to use BOOL if it is properly supported, because it conveys your intentions best, but I would avoid using ENUM(0,1) .

The first problem with ENUM is that its value requires a string. 0 and 1 look like a number, so programmers tend to send him a number.

The second problem with ENUM is that if you send it the wrong default value for the first enumeration, and in some databases it will not even indicate an error (I'm looking at you MySQL). This makes the first problem a lot worse, because if you accidentally send it 1 instead of "1" , it will keep the value "0" - very intuitive!

I don’t think that it affects all the database engines (I don’t know, I haven’t tried them all), but it affects enough of them that I consider avoiding it to be good practice.

0

slebetman Dec 27 '10 at 1:11

source share

Performancedba · Accepted Answer · 2010-12-27T02:32:43+0000

This answer complies with ISO / IEC / ANSI Standard SQL and includes the best free SQL claims.

The first problem is that you have identified two categories, not one, so they cannot be matched.

but. First category

(1) (4) and (5) contain several possible values and are one category. Everything can be easily and effectively used in the WHERE clause. They have the same storage, so the problem of storage and reading is not a problem. Therefore, the remaining selection is simply based on the actual Datatype for the purpose of the column.

ENUM is non-standard; the best or standard method is to use a lookup table; then the values are displayed in the table, not hidden, and can be listed by any report tool. ENUM readings will suffer from a small hit due to internal processing.

B. Category Two

(2) and (3) are two-digit elements: True / False; Male female; The living dead. This category is different from the first category. Its processing in both your data model and each platform is different. BOOLEAN is just a synonym for BIT, they are one and the same. Legally (SQL-wise) are handled equally by all SQL-compatible platforms, and there is no problem using it in the WHERE clause.

The difference in performance varies by platform. Sybase and DB2 pack up to 8 BITs in one byte (not in this case for data storage), and also display two-two strengths on the fly, so performance is really good. Oracle does different things in each version, and I saw how designers use CHAR (1) instead of BIT to overcome performance issues. MS was fine until 2005, but they interrupted it since 2008, as the results are unpredictable; therefore, the short answer may be to implement it as CHAR (1).

Of course, the assumption is that you are not doing stupid things like 8 separate columns in one TINYINT. Not only is a serious normalization error, it is a nightmare for coders. Keep each column discrete and the right data type.

C. Multiple Indicators and Zero Columns

This has nothing to do and is independent of (A) and (B). What are columns, the correct data type, is separate for how much you have, and whether it is Nullable. Nullable means (usually) an optional column. In fact, you did not complete the simulation or normalization exercise. Functional dependencies are ambiguous. if you performed the Normalization exercise, there will be no Nullable columns, optional columns; either they clearly exist for a certain relationship, or they do not exist. This means using the usual relational structure of subtype supertypes.

Of course, this means more tables, but not Nulls. The Enterpise firewall has no problems with a large number of tables or more joins, for which they are optimized. Normalized databases work much better than non-normalized or denormalized ones, and they can be expanded without "re-factoring". You can facilitate use by providing a view for each subtype.

If you want more information on this, see the question / answer . If you need help with modeling, ask a new question. At your survey level, I would advise you to stick with 5NF.

D. Executing Zeros

Separately, if performance is important to you, then exclude Nulls. Each Nullable column is stored as a variable length; which requires additional processing for each row / column. Enterprise databases use “deferred” processing for such rows, allowing journaling, etc. Move thoughts in line without interfering with fixed lines. In particular, never use variable-length columns (including Nullable columns) in an index: this requires unpacking each access .

E. Survey

Finally, I see no reason in this question to be a survey. Fairly enough, you will receive technical answers and even opinions, but surveys are conducted for popularity contests, and respondents' technical abilities in SO cover a very wide range, so the most popular answers and the most technically correct answers are at two different ends of the spectrum.

What is the best way to store binary flags / booleans in each database engine?

More articles: