How to calculate the size (in bytes) of a subset of rows from a MySQL table using PHP with PDO?

First of all, I am working on providing a shared hosting service with PHP 5.4.11 with the extension PDO and MySQL 5.1.66 (with Debian compression).

I am currently developing a service in which users have a limited quota for storing data in a database. At the moment, there is only one table in which user information is stored, which must be respected regarding the quota (but this can change). All tables use the InnoDB storage engine and utf8_unicode_ci command for text columns. Suppose a quota table has the following columns:

+--------------+-----------+ | Column name | Type | +--------------+-----------+ | id | int | | userId | int | | created | timestamp | | lastModified | timestamp | | description | varchar | | content | text | +--------------+-----------+ 

Now I need to calculate the size in bytes of all lines belonging to a specific user. I searched the documentation and went to Google, but found other people asking similar questions without receiving a satisfactory answer.

I know the MySQL LENGTH() function, but since it is a string function, it does not return the space occupied by numerical and date / time-fixed fields. And if you took into account only the string fields, the user could simply fill the database with blank lines that never reach their quota. I also know that in MySQL there is some overhead for each line of description, but I do not want to include it in the calculation. (As an equivalent, I would like to calculate the actual file size, not the file size on disk.)

In addition, I do not want to rely on the specific structure of the table, as this may change, and you will need to remember the function that calculates the quota.

Due to the lack of an existing solution, I came up with my own (see below). But it has some disadvantages, for example:

  • He needs a list of data types and their corresponding sizes used in the table.
  • It cannot accurately process the data types FLOAT(p) , DECIMAL(M,D) , NUMERIC(M,D) and BIT(M) (althouhg this can be implemented).
  • He needs two separate requests.

So, here is what I came up with:

 $db = new PDO(...); $tablename = 'users'; $userId = 1; // Make a list of type sizes in bytes - null indicates string types of // varying size. If there is a type used in the database which is not // listed here, an exception will be thrown. $typeSizes = array( 'int' => 4, 'timestamp' => 4, 'varchar' => null, 'text' => null ); // Get datatypes used in the table. $sql = 'SELECT COLUMN_NAME, DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS ' . 'WHERE TABLE_NAME=?'; $stmt = $db->prepare($sql); $stmt->bindValue(1, $tablename); $stmt->execute(); $colTypes = array_map('reset', array_map('reset', $stmt->fetchAll(PDO::FETCH_GROUP|PDO::FETCH_ASSOC))); // Iterate over the existing columns. Sum up sizes of fixed size columns to // get a 'fixed-size-factor' for a row. Make a list of varying size columns. $fixedSizeFactor = 0; $varyingSizeCols = array(); foreach ($colTypes as $colName => $colType) { if (array_key_exists($colType, $typeSizes)) { if ($typeSizes[$colType] !== null) { $fixedSizeFactor += $typeSizes[$colType]; } else { $varyingSizeCols[] = $colName; } } else { $msg = "Unhandled column type '$colType' - unable to calculate used " . 'storage. Probably the $typeSizes array needs to be updated.'; throw new Exception($msg); } } // Get number of all records of the user and the size of his data in // varying size columns. $sumArgument = 0; if (count($varyingSizeCols) > 0) { $sumArgument = 'LENGTH(' . implode(') + LENGTH(', $varyingSizeCols) . ')'; } $sql = 'SELECT SUM(' . $sumArgument . ') AS size, COUNT(*) AS count FROM ' . $tablename . ' WHERE userId=?'; $stmt = $db->prepare($sql); $stmt->bindValue(1, $userId); $stmt->execute(); $result = $stmt->fetch(PDO::FETCH_ASSOC); // Calculate used storage. $usedStorage = $result['count'] * $fixedSizeFactor + $result['size']; 

My question is: Is there a more β€œnative”, less hacker way to do this? If not, do you have any suggestions for optimizing performance?

+4
source share
1 answer

Just forget about numbers and dates, it really is so cheap if you limit the user due to such fields ...

Use the LENGTH (for texts) and OCTET_LENGTH (for blobs) methods, and that should be enough.

If you are actually behind the repository and you MUST split it up per user, remember to also manage the log, which will increase disk space, and it depends on what use the user makes from your database ..

0
source

All Articles