What is the disadvantage of using strings for data other than String?

Question

What is the disadvantage of using strings for data other than String?

I know this may be a kind of "stupid" question. I created software applications before I basically initialized all my variables as strings and saved them in my database as VARCHAR. Then I collected them from the database and converted them as needed. Is there a reason this is not an efficient method of initializing variables and storing them in my database?

I know that for extremely large applications this can cause a computational time problem, because I unreasonably convert variables that could be initialized as appropriate types for starters. But for small applications, is this "normal" to do?

+5

java types

Ryan shukis Feb 22 '15 at 5:58

source share

2 answers

There are several reasons. For example, consider looking for a time range . This is easy to find using the datetime fields. But this is not so simple, since you have to do it in your application.

Another sorting point on varchar will be different from an int field. In varchar 10 to 2, but after int this happens after that.

+1

Hightower Feb 22 '15 at 6:10

source share

Willie wheeler · Accepted Answer · 2015-02-22T06:34:48+0000

Some reasons for using the correct types

1. The smallest surprise. . If developers are going to collect numerical data from your database, it will be strange for them that you store them as strings.

2. Convenience for developers. Another thing is that every time you need to analyze the data in the correct type. If you just save it as the right type, then you will save people from having to set

int age = 0; try { age = Integer.parseInt(ageStr); } catch (NumberFormatException e) { throw new RuntimeException(e); }

throughout the code.

3. Data quality. In the above code example, hints at a third problem. Now it’s possible for someone to store “no_age” or “foo” or something in a column, which is a data quality problem. The best way to deal with errors is to make them impossible in the first place.

4. Storage efficiency. Storage efficiency is also a factor. Different types have different ways of encoding data, and strings are not an efficient way to store numbers, bits, etc.

5. Network efficiency. If you store data in wasteful formats, this often leads to unnecessary network use. This is why binary formats are usually more efficient than text formats such as JSON or XML. But web services usually do not see network performance as an engineering problem.

6. Processing efficiency. . If the data are inherently numerical, then forcing everyone to parse requires processing costs.

7. Different types support different rules. . In his answer, Hightower emphasizes that different types have special rules for ordering, which affects ranges and views. I like this point because it affects the actual behavior of the program, while the problems I mentioned above may be more academic for small applications with one developer.

Efficiency Efficiency Example

Suppose you want to save eight bits. If you saved this as a string, you might have "TFFTFFTF", which under UTF-8 and ASCII will accept 64 bits (8 characters x 8 bits per char) to store eight bits of actual information. Relatively speaking, that is a big difference.

By the way, even if your data is numeric, it’s not good to use BIGINT , for example. Different types of integers in the database have different storage requirements, so you should think about the number of bits you need, if necessary, use unsigned representations (there is no reason to spend a signed bit on numbers that cannot be negative), etc. Incorrect options tend to add up quickly when you create new foreign keys, which should now be BIGINT, new lines, all of which have a BIGINT bunch, etc. Storage and backup requirements eventually become unnecessary.

So. OK to use strings?

These performance issues may not matter at all for the little thing you asked. Or there may be reasons to prefer an inefficient format over one that is more efficient, as my JSON / XML example shows. So, as far as this is “normal”, I cannot answer this, but I hope the above considerations give you some tools for making this decision.

However, I would try to get used to the correct type, and of course I would not go astray to store things as strings without any reason. In cases with bitrate, I could see that I was potentially avoiding dealing with bit manipulation, which can be tricky until you hang it. (But some databases have special bitett types.) You mention that you don’t know the type and that is probably a plausible reason in some cases, although I would use refactoring more.

What is the disadvantage of using strings for data other than String?

Some reasons for using the correct types

Efficiency Efficiency Example

So. OK to use strings?

More articles: