Postgres docs say
To get the best optimization results, you should mark your functions with the most stringent volatility category that is valid for them.
However, it seems that I have an example where this is not the case, and I would like to understand what is happening. (Background: I am running postgres 9.2)
I often need to convert times, expressed as integers from seconds to dates. I wrote a function for this:
CREATE OR REPLACE FUNCTION to_datestamp(time_int double precision) RETURNS date AS $$ SELECT date_trunc('day', to_timestamp($1))::date; $$ LANGUAGE SQL;
Let us compare the performance with other identical functions, with the change in volatility with IMMUTABLE and STABLE:
CREATE OR REPLACE FUNCTION to_datestamp_immutable(time_int double precision) RETURNS date AS $$ SELECT date_trunc('day', to_timestamp($1))::date; $$ LANGUAGE SQL IMMUTABLE;
CREATE OR REPLACE FUNCTION to_datestamp_stable(time_int double precision) RETURNS date AS $$ SELECT date_trunc('day', to_timestamp($1))::date; $$ LANGUAGE SQL STABLE;
To test this, I will create a table of 10 ^ 6 random numbers corresponding to the times between 2010-01-01 and 2015-01-01
CREATE TEMPORARY TABLE random_times AS SELECT 1262304000 + round(random() * 157766400) AS time_int FROM generate_series(1, 1000000) x;
Finally, I will name two functions in this table; on my specific field, the original takes ~ 6 seconds, the immutable version takes ~ 33 seconds, and the stable version takes ~ 6 seconds.
EXPLAIN ANALYZE SELECT to_datestamp(time_int) FROM random_times; Seq Scan on random_times (cost=0.00..20996.62 rows=946950 width=8) (actual time=0.150..5493.722 rows=1000000 loops=1) Total runtime: 6258.827 ms EXPLAIN ANALYZE SELECT to_datestamp_immutable(time_int) FROM random_times; Seq Scan on random_times (cost=0.00..250632.00 rows=946950 width=8) (actual time=0.211..32209.964 rows=1000000 loops=1) Total runtime: 33060.918 ms EXPLAIN ANALYZE SELECT to_datestamp_stable(time_int) FROM random_times; Seq Scan on random_times (cost=0.00..20996.62 rows=946950 width=8) (actual time=0.086..5295.608 rows=1000000 loops=1) Total runtime: 6063.498 ms
What's going on here? For example, the results of postgres time caching are wasting time when this is actually not useful, since the arguments passed to the function are unlikely to be repeated?
(I am running postgres 9.2.)
Thanks!
UPDATE
Thanks to Craig Ringer, this has been discussed on the pgsql-performance mailing list . Main characteristics:
Tom Lane says
[shrug ...] Using IMMUTABLE to lie about function variability (in this case date_trunc) is a bad idea. This can lead to incorrect answers, not to mention performance issues. In this particular case, I imagine that a performance problem arises from the fact that you suppressed the option to embed the function body ... but you should be more worried about whether you get fictitious answers in other cases.
Pavel Steule says
If I understand, the IMMUTABLE flag used disables embedding. What you see is SQL eval overflow. My rule is not to use flags in SQL functions when possible.