Choosing a data type for MySQL?

I have been studying and reading about SQL data types for several days (I know ... I know this is not very long), and one of the things that I find difficult to understand is how to choose the best data type for extensibility, efficiency and simplicity access.

I think it’s pretty simple how to choose basic data types (i.e. int vs varchar), but how do you choose things like blob and text type.

The man pages for MySQL files are great, but they are not what we computer developers love ... effective.

I think it would be great if we could make a list of MySQL data types, the common advantages / disadvantages of each, and when it would be advisable to choose this data type.

+7
source share
3 answers

MySQL string types come in two flavors: one with no character set label and one with a character set label.

A fixed-length string filled with spaces at the end is CHAR (n). A match type that does not have a character set label is BINARY (n). Saving the string "hello" in CHAR(255) CHARSET utf8 will take 765 bytes (the string is filled with spaces to the full length, saved as utf8, which uses 3 bytes / character as the worst space, allocates 3 * 255 bytes).

A variable-length string with one or two bytes of length and without padding is VARCHAR ((n). A match type that does not have a character set label is VARBINARY (n). Saving the string "hello" to VARCHAR(255) CHARSET utf8 will take 6 bytes (1 byte of length plus 5 bytes for the actual text). Storing the string ク リ ス in the same type will occupy 10 bytes (1 byte of length plus 3 characters using 3 bytes per character to represent them).

 mysql> select hex('クリス'), length(hex('クリス'))/2 as bytes; +--------------------+--------+ | hex('クリス') | bytes | +--------------------+--------+ | E382AFE383AAE382B9 | 9.0000 | +--------------------+--------+ 1 row in set (0.02 sec) 

A string of variable length with one, two, three or four bytes of length - TINYTEXT, TEXT, MEDIUMTEXT and LARGETEXT. Match types that do not have a character set label are TINYBLOB, BLOB, MEDIUMBLOB, and LARGEBLOB.

The TEXT / BLOB type differs from the VARCHAR / VARBINARY type, how and where the data is stored, see http://www.mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb/ for details on how types TEXT / BLOBs are stored in InnoDB depending on the version and settings of ROW_FORMAT. For performance reasons, you need the latest InnoDB and Barracuda-Format tables.

MySQL is unable to work with any data that is larger than max_allowed_packet (default: 1M) unless you create complex and intensive workarounds on the server side. This further restricts what can be done with the TEXT / BLOB types, and typically makes the LARGETEXT / LARGEBLOB type useless in the default configuration.

For types without a character set label (BINARY, VARBINARY and% BLOB%), MySQL will receive the received data and write it to disk. For types with a character set label, MySQL will look at what you declared as your client character installed on the server using SET NAMES and what is indicated by character set characters. Then it converts from the character set of the connection to the character set of the column and writes the converted data. You can check this with the HEX () function, for example. SELECT HEX(str) FROM t WHERE id = ...

When retrieving, the declared message set with the SET NAMES message may differ from what it was at the time of recording. MySQL will again check the column character set label for the character set declared for this connection and, if necessary, convert it to the connection character set.

The performance degradation for this conversion is in any case insignificant compared to the time spent on the I / O disk incurred for such data; in any case, in terms of performance it hardly matters which type you choose. Instead, the rule is: select a type with a character set label if you are working with text data, and a type without if you do not.


A related question is often asked: should I choose CHAR or VARCHAR (BINARY or VARBINARY respectively)?

For InnoDB, the answer is always: select a data type of variable length. InnoDB has no performance advantages over fixed length data types, but there is a huge size if you select a fixed length data type and then do not use all the space in it. Plus, fixed-length SQL rows have really weird rules for filling and trimming with spaces at the end, which you probably don't bother to study. For MySQL, the case may be different, but almost never exists.


Another related question: should I choose VARCHAR or TEXT for my strings (VARBINARY or BLOB, respectively)?

The answer to this question is to use the latest version of InnoDB, Barracuda format tables, and then TEXT / BLOB. The reason for this is explained in detail at http://www.mysqlperformanceblog.com/2011/04/07/innodb-row-size-limitation/ . The result of this is: with VARCHAR or TEXT / BLOB in pre-Barracuda format, you run the risk of overflowing the InnoDB line size limit if you have too many of them on one line.


And finally: Should I store files / images / other large blobs or text data in a database?

The answer for this: Usually not. Serving files from a database ( http://mysqldump.azundris.com/archives/36-Serving-Images-From-A-Database.html ) is an expensive operation compared to servicing files from a file system. If at all possible, you would like to do it. There is a way around this, http://www.blobstreaming.org/ , but it is an advanced technology that requires full control over the runtime that never happens in a hosted environment.


To round it up: there are no variable-length data types in the MEMORY machine tables. Therefore, if you see "using temporary" in the output of EXPLAIN , it means

  • VARCHAR is converted to CHAR in a temporary table
  • VARBINARY is converted to BINARY

If the temporary table of this process becomes larger than tmp_table_size OR max_heap_table_size, it is converted on the fly to MyISAM format and transferred to disk.

Example. You define a Ruby Active Record User class that contains ten fields marked as :string . Each of them ends with a VARCHAR(255) CHARSET utf8 in your Users table.

Elsewhere in your code base, Users used in a way that includes a plan using temporary . You die instantly in disk operations under load, because each row of the Users table now uses at least 7650 bytes in MEMORY, most of which are spaces used as padding. This causes the temporary table to be converted to MyISAM and written to disk.

  • any type of% TEXT% or% BLOB% cannot be represented in MEMORY, so the temporary table goes to disk as MyISAM, even if it would be small enough to be stored in memory in accordance with the above limits.

This means that any query with a TEXT or BLOB type and a “using temporary” plan must be rewritten to avoid the appearance of temporary tables that hit the disk.

+11
source

Regarding BLOB vs TEXT (since this is the only specific question in your post): BLOB is for binary data, and TEXT is for text data.

It is usually quite simple to use the most specific column type that suits your needs and discard common types if none of them match your usage.

+1
source

For MySQL, there is a procedure called analysis that will evaluate the data heuristic with the idea that it provides the best choice for the data type and suggests a range or values ​​for listing.

Fast dynamic concat script to generate SQL to run

 select CONCAT(' SELECT ', COLUMN_NAME, ' FROM ', TABLE_NAME, ' procedure analyse() ;' ) FROM INFORMATION_SCHEMA.COLUMNS WHERE table_schema ="yourDbName" AND DATA_TYPE ="varchar" AND CHARACTER_MAXIMUM_LENGTH > 190 AND COLUMN_KEY not in (' ') ; 

** SQL does not evaluate PK above - unless they are text fields

This procedure is useful if you want to change the data type based on data usage or increase efficiency by moving or saving a smaller data packet.

The Percona Blog has a good working example of a process analysis that applies to Drupal. https://www.percona.com/blog/2009/03/23/procedure-analyse/

Some of these studies are conducted for compression, which is associated with longer utf8mb4 indices http://techblog.constantcontact.com/devops/space-the-final-frontier-a-story-of-mysql-compression/

+1
source

All Articles