Query Time Statistics (PostgreSQL)

I have a table with a billion rows, and I would like to determine the average time and standard deviation of time for several form requests:

select * from mytable where col1 = '36e2ae77-43fa-4efa-aece-cd7b8b669043'; select * from mytable where col1 = '4b58c002-bea4-42c9-8f31-06a499cabc51'; select * from mytable where col1 = 'b97242ae-9f6c-4f36-ad12-baee9afae194'; .... 

I have a thousand random values ​​for col1 stored in another table.

Is there a way to store how long each of these requests was received (in milliseconds) in a separate table so that I can run some statistics on them? Something like: for each col1 in my random table, execute a query, write down the time, and then save it in another table.

A completely different approach would be great if I can stay in PostgreSQL (i.e. I don't want to write an external program for this).

+7
performance sql postgresql
source share
4 answers

Do you know about EXPLAIN statement ?

This command displays the execution plan that the PostgreSQL scheduler generates for the supplied statement. The execution plan shows how the table (s) referenced by the operator will be scanned - by simple sequential scanning, index scanning, etc. - and if they refer to several tables, which join algorithms will be used to combine the required rows from each input table.

The most important part of the display is the estimated cost of executing the instruction, which is the scheduler, guessing how long it will take to execute the statement (measured in sampling units on disk). In fact, two numbers are displayed: the start time to the first line can be returned and the total time to return all the lines. For most queries, total time matters, but in contexts such as a subquery in EXISTS, the scheduler will choose the shortest start time instead of the smallest total time (since the executor will stop after receiving one row). In addition, if you limit the number of rows returned using the LIMIT clause, the scheduler makes an appropriate interpolation between the costs of the endpoint to evaluate which plan is really the cheapest.

The ANALYZE option forces the operator to execute in fact not only the planned ones. The total elapsed time spent on each node plan (in milliseconds) and the total number of rows it actually returns are added to the display. This is useful to see if planners are approaching reality.

You can quite easily write a script that executes EXPLAIN ANALYZE at your request for each of the random values ​​in the table, and save the output in a file / table / etc.

+6
source share

You need to modify the PostgreSQL configuration file.

Include this property:

 log_min_duration_statement = -1 # -1 is disabled, 0 logs all statements # and their durations, > 0 logs only # statements running at least this number # of milliseconds 

After that, the runtime will be recorded, and you can determine exactly how bad (or good) your requests are.

You can also use some LOG PARSING utilities to provide awesome HTML output for further analysis, such as pgfouine .

+11
source share

Directly, no, no. But you can make an indirect and fairly close assessment by checking the time right before and immediately after the request that interests you.

 $sql = "Your Query"; $bm = "SELECT extract(epoch FROM clock_timestamp())"; $query = "{$bm}; {$sql}; {$bm};"; 

The clock_timestamp () function gives you the actual server time when the instruction starts. Since SELECT does not contain tables, we can expect it to be almost instantaneous. I assume that any Pg driver offers support for multiple requests; it is important that these 3 requests (real and 2 additions) are combined, otherwise you will also measure the data transfer time ...

For PHP, I have a function to handle this. As a result, it looks as follows:

 <?php function pgquery($sql, $conn) { // Prepend and append benchmarking queries $bm = "SELECT extract(epoch FROM clock_timestamp())"; $query = "{$bm}; {$sql}; {$bm};"; // Execute the query, and time it (data transport included) $ini = microtime(true); pg_send_query($conn, $query); while ($resource = pg_get_result($conn)) { $resources[] = $resource; } $end = microtime(true); // "Extract" the benchmarking results $q_ini = pg_fetch_row(array_shift($resources)); $q_end = pg_fetch_row(array_pop($resources)); // Compute times $time = round($end - $ini, 4); # Total time (inc. transport) $q_time = round($q_end[0] - $q_ini[0], 4); # Query time (Pg server only) return $resources; } ?> 

I just left the basics there. $ conn contains a link to the Pg connection, and $ resources contains an array of pg resources returned (in case you sent several requests in your sql file).

$ time has a total time since the request remains for the Pg server until the result reaches the result. $ q-time contains only the actual request time (or a very good approximation).

Add error handling and other processing to your liking, I have a lot, but this is not relevant to your question.

+2
source share

You CANNOT do this in SQL because even if you can call each of these statements in a loop, every call to now () will return the same result because you are in the same transaction.

This is possible by creating your own volatile now () function, returning a different value for each call.

0
source share

All Articles