MySQL string types come in two flavors: one with no character set label and one with a character set label.
A fixed-length string filled with spaces at the end is CHAR (n). A match type that does not have a character set label is BINARY (n). Saving the string "hello" in CHAR(255) CHARSET utf8 will take 765 bytes (the string is filled with spaces to the full length, saved as utf8, which uses 3 bytes / character as the worst space, allocates 3 * 255 bytes).
A variable-length string with one or two bytes of length and without padding is VARCHAR ((n). A match type that does not have a character set label is VARBINARY (n). Saving the string "hello" to VARCHAR(255) CHARSET utf8 will take 6 bytes (1 byte of length plus 5 bytes for the actual text). Storing the string ク リ ス in the same type will occupy 10 bytes (1 byte of length plus 3 characters using 3 bytes per character to represent them).
mysql> select hex('クリス'), length(hex('クリス'))/2 as bytes; +--------------------+--------+ | hex('クリス') | bytes | +--------------------+--------+ | E382AFE383AAE382B9 | 9.0000 | +--------------------+--------+ 1 row in set (0.02 sec)
A string of variable length with one, two, three or four bytes of length - TINYTEXT, TEXT, MEDIUMTEXT and LARGETEXT. Match types that do not have a character set label are TINYBLOB, BLOB, MEDIUMBLOB, and LARGEBLOB.
The TEXT / BLOB type differs from the VARCHAR / VARBINARY type, how and where the data is stored, see http://www.mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb/ for details on how types TEXT / BLOBs are stored in InnoDB depending on the version and settings of ROW_FORMAT. For performance reasons, you need the latest InnoDB and Barracuda-Format tables.
MySQL is unable to work with any data that is larger than max_allowed_packet (default: 1M) unless you create complex and intensive workarounds on the server side. This further restricts what can be done with the TEXT / BLOB types, and typically makes the LARGETEXT / LARGEBLOB type useless in the default configuration.
For types without a character set label (BINARY, VARBINARY and% BLOB%), MySQL will receive the received data and write it to disk. For types with a character set label, MySQL will look at what you declared as your client character installed on the server using SET NAMES and what is indicated by character set characters. Then it converts from the character set of the connection to the character set of the column and writes the converted data. You can check this with the HEX () function, for example. SELECT HEX(str) FROM t WHERE id = ...
When retrieving, the declared message set with the SET NAMES message may differ from what it was at the time of recording. MySQL will again check the column character set label for the character set declared for this connection and, if necessary, convert it to the connection character set.
The performance degradation for this conversion is in any case insignificant compared to the time spent on the I / O disk incurred for such data; in any case, in terms of performance it hardly matters which type you choose. Instead, the rule is: select a type with a character set label if you are working with text data, and a type without if you do not.
A related question is often asked: should I choose CHAR or VARCHAR (BINARY or VARBINARY respectively)?
For InnoDB, the answer is always: select a data type of variable length. InnoDB has no performance advantages over fixed length data types, but there is a huge size if you select a fixed length data type and then do not use all the space in it. Plus, fixed-length SQL rows have really weird rules for filling and trimming with spaces at the end, which you probably don't bother to study. For MySQL, the case may be different, but almost never exists.
Another related question: should I choose VARCHAR or TEXT for my strings (VARBINARY or BLOB, respectively)?
The answer to this question is to use the latest version of InnoDB, Barracuda format tables, and then TEXT / BLOB. The reason for this is explained in detail at http://www.mysqlperformanceblog.com/2011/04/07/innodb-row-size-limitation/ . The result of this is: with VARCHAR or TEXT / BLOB in pre-Barracuda format, you run the risk of overflowing the InnoDB line size limit if you have too many of them on one line.
And finally: Should I store files / images / other large blobs or text data in a database?
The answer for this: Usually not. Serving files from a database ( http://mysqldump.azundris.com/archives/36-Serving-Images-From-A-Database.html ) is an expensive operation compared to servicing files from a file system. If at all possible, you would like to do it. There is a way around this, http://www.blobstreaming.org/ , but it is an advanced technology that requires full control over the runtime that never happens in a hosted environment.
To round it up: there are no variable-length data types in the MEMORY machine tables. Therefore, if you see "using temporary" in the output of EXPLAIN , it means
- VARCHAR is converted to CHAR in a temporary table
- VARBINARY is converted to BINARY
If the temporary table of this process becomes larger than tmp_table_size OR max_heap_table_size, it is converted on the fly to MyISAM format and transferred to disk.
Example. You define a Ruby Active Record User class that contains ten fields marked as :string . Each of them ends with a VARCHAR(255) CHARSET utf8 in your Users table.
Elsewhere in your code base, Users used in a way that includes a plan using temporary . You die instantly in disk operations under load, because each row of the Users table now uses at least 7650 bytes in MEMORY, most of which are spaces used as padding. This causes the temporary table to be converted to MyISAM and written to disk.
- any type of% TEXT% or% BLOB% cannot be represented in MEMORY, so the temporary table goes to disk as MyISAM, even if it would be small enough to be stored in memory in accordance with the above limits.
This means that any query with a TEXT or BLOB type and a “using temporary” plan must be rewritten to avoid the appearance of temporary tables that hit the disk.