- Because
avg_row_length is data_length / rows .
data_length is basically the total size of the table on disk. An InnoDB table is more than just a list of rows. So there is extra overhead.
- Because InnoDB string is bigger than data.
As above, each line contains some overhead. So add to line size. The InnoDB table is also not just a list of data populated together. This requires a little extra space for efficient operation.
- Since the material is stored on disks in blocks, and these blocks are not always filled.
Disks store things usually in 4K, 8K or 16K blocks . Sometimes things fit perfectly into these blocks, so you can get some empty space .
As we will see below, MySQL is going to distribute the table in blocks. And it will allocate much more than necessary so as not to enlarge the table (which can be slow and lead to disk fragmentation , which makes things even slower).
To illustrate this, let's start with an empty table.
mysql> create table foo ( id smallint(5) unsigned NOT NULL ); mysql> select data_length, table_rows, avg_row_length from information_schema.tables where table_name = 'foo'; +-------------+------------+----------------+ | data_length | table_rows | avg_row_length | +-------------+------------+----------------+ | 16384 | 0 | 0 | +-------------+------------+----------------+
It uses 16K or four 4K blocks to store nothing. An empty table does not need this space, but MySQL allocated it based on the assumption that you are going to add a bunch of data to it. This avoids the costly redistribution of each insert.
Now add a line.
mysql> insert into foo (id) VALUES (1); mysql> select data_length, table_rows, avg_row_length from information_schema.tables where table_name = 'foo'; +-------------+------------+----------------+ | data_length | table_rows | avg_row_length | +-------------+------------+----------------+ | 16384 | 1 | 16384 | +-------------+------------+----------------+
The table did not become bigger, there was all unused space within these 4 blocks. There is one row, which means avg_row_length 16K. Absurdly absurd. Add another line.
mysql> insert into foo (id) VALUES (1); mysql> select data_length, table_rows, avg_row_length from information_schema.tables where table_name = 'foo'; +-------------+------------+----------------+ | data_length | table_rows | avg_row_length | +-------------+------------+----------------+ | 16384 | 2 | 8192 | +-------------+------------+----------------+
Same. 16K is allocated for the table, 2 rows use this space. An absurd result of 8K per line.
When I insert more and more rows, the size of the table remains the same, it uses more and more of its allocated space, and avg_row_length comes closer to reality.
mysql> select data_length, table_rows, avg_row_length from information_schema.tables where table_name = 'foo'; +-------------+------------+----------------+ | data_length | table_rows | avg_row_length | +-------------+------------+----------------+ | 16384 | 2047 | 8 | +-------------+------------+----------------+
Here we also begin to see that table_rows becomes inaccurate. I definitely inserted 2048 rows.
Now when I insert more ...
mysql> select data_length, table_rows, avg_row_length from information_schema.tables where table_name = 'foo'; +-------------+------------+----------------+ | data_length | table_rows | avg_row_length | +-------------+------------+----------------+ | 98304 | 2560 | 38 | +-------------+------------+----------------+
(I inserted 512 rows, and table_rows returned to reality for some reason)
MySQL decided that the table needed more space, so it changed and took up most of the disk space. avg_row_length just jumped again.
It took up much more space than needed for these 512 lines, now it is 96K or 24 4K blocks, based on the assumption that it will be needed later. This minimizes the number of potentially slow redistributions that need to be performed, and minimizes disk fragmentation.
This does not mean that all this space has been filled . It just means that MySQL thought it was full enough to require more space for efficient operation. If you need an idea of โโwhy, check out how the hash table works. I don't know if InnoDB uses a hash table, but the principle applies: some data structures work best when there is some kind of empty space.
The disk used by the table is directly related to the number of rows and column types in the table, but the exact formula is difficult to determine and will change from version to version of MySQL. Itโs best to do some empirical tests and put up with the fact that you will never get the exact number.